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ABSTRACT 

The Survey Science Centre of the XMM-Newton satellite released the first incremental version of the 2XMM catalogue in August 
2008 . Containing more than 220,000 X-ray sources, the 2XMMi was at that time the largest catalogue of X-ray sources ever published 
and thus constitutes an unprecedented resource for studying the high-energy properties of various classes of X-ray emitters such as 
AGN and stars. Thanks to the high throughput of the EPIC cameras on board XMM-Newton accurate positions, fluxes, and hardness 
ratios are available for a substantial fraction of the X-ray detections. The advent of the T'' release of the Sloan Digital Sky Survey 
offers the opportunity to cross-match two major surveys and extend the spectral energy distribution of many 2XMMi sources towards 
the optical bands. This implies building extensive homogeneous samples with a statistically controlled rate of spurious matches and 
completeness. We here present a cross-matching algorithm based on the classical likelihood ratio estimator The method developed 
has the advantage of providing true probabilities of identifications without resorting to heavy Monte-Carlo simulations. Over 30,000 
2XMMi sources have SDSS counterparts with individual probabilities of identification higher than 90%. At this threshold, the sample 
has only 2% spurious matches and contains 77% of all expected SDSS identifications. Using spectroscopic identifications from the 
SDSS DR7 catalogue supplemented by extraction from other catalogues, we build an identified sample from which the way the 
various classes of X-ray emitters gather in the multi dimensional parameter space can be analysed and later used to design a source 
classification scheme. We illustrate the interest of this clean source sample by investigating two scientific use cases. In the first 
example we show how these multi-wavelength data can be used to search for new QS02s. Although no specific range of observed 
properties allows us to efficiently identify Compton Thick QS02s, we show that the prospects are much better for Compton Thin 
AGN2 and discuss several possible multi-parameter selection strategies. In a second example, we confirm the hardening of the mean 
X-ray spectrum with increasing X-ray luminosity on a sample of over 500 X-ray active stars and reveal that on average X-ray active 
M stars display bluer g - r colour indexes than less active ones. Although this catalogue of 2XMM-SDSS sources cannot be used 
directly for statistical studies, it nevertheless represents an excellent starting point to select well defined samples of X-ray-emitting 
objects. 



1. Introduction 

The growing collecting area and sensitivity of modern astronom- 
ical detectors combined with the increasing storage and process- 
ing capabilities offered by current computer facilities has made 
possible the gathering on comparatively short time scales of very 
large sky surveys that were beyond reach only a few years ago. 
Most parts of the electromagnetic spectrum benefit from this 
evolution. Among recently completed or ongoing projects are 
the Two Micron All Sky Survey (2MASS) (Cutri et al. 2003) and 
the Sloan Digital Sky Survey ( .Adelman-McCarthy et al.„2008j ) 
for instance. Space-borne missions currently in operation such 
as the Spitzer Space Telesco pe (jWemer et al. 1200 4*) observing in 
the infra- red or the Chandra ( [Weisskopf et al.|2000 ) and XMM- 
Newton ( Jansen et al.||200l) X-ray observatories are collecting 



* The corresponding fits file can be downloaded from the XCat-DB 
home page (http://xcatdb.u-strasbg.fr/ 1. The file also contains line infor- 
mation for all SDSS spectroscopic entries matching a 2XMM source. 
Results from the cross-correlation with the 2XMM DR3 are also avail- 
able at the same location. 



at a high rate a wealth of measurements on an unprecedented 
number of objects in their energy range. In the relatively near 
future, ground-based automated very large telescopes such as 
pan-STARRS ( |Wang et al.|2010[ ) or such as the Large Synoptic 
Survey Telescope ( |Tyson|2002| l will collect detailed photometric 
information on a breathtaking number of faint galaxies. 

Merging measurements arising from several instruments al- 
lows us to build spectral energy distributions in a range of wave- 
lengths extending over a large part of the electromagnetic spec- 
trum. The recent availability of wide angle surveys with high de- 
tection sensitivities allows us to measure with comparable accu- 
racies and in several scientifically important wavelength ranges 
the spectral energy density of the main classes of X-ray emit- 
ting astrophysical sources. Building large homogeneous samples 
provides valuable insight on the emission mechanisms and evo- 
lutionary processes and may allow the detection of rare objects 
or outliers, which would be otherwise hard to unveil in smaller 
samples. In this respect, a good estimate of the true rate of false 
cross-identification is important to assess the relevance of any 
group of outliers. 
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However, the gathering of large groups of sources with well 
characterised multi-wavelength properties first requires a proper 
handling of the cross-matching process between two or more 
catalogues. Although spatial resolution at high-energy steadily 
increased during the last years and may go on improving in the 
future, source density also grows as a result of the improved sen- 
sitivity, and the risk of confusion between unrelated objects de- 
tected at different wavelengths does not necessarily vanish. The 
confusion problem can be particularly arduous when compar- 
ing catalogues with very different spatial resolutions and den- 
sities, a problem often encountered in the identification process 
of high-energy sources which in several cases lack the superb 
spatial resolution affordable for instance in the optical domain, 
see e.g. [Rutledge et al.| (|2000| for the identification of ROS AT 



sources and Luo et al. (20101 for a recent example involving 
multi-wavelength catalogues with different depths and angular 
resolutions. 

The XMM-Newton satellite Pansen et al.| |2001 |) was 
launched by the European Space Agency late in 1999. XMM- 
Newton is currently the X-ray (0.2-12 keV) telescope in opera- 
tion with the largest effective area. Three co-aligned telescopes 
feed two EPIC M PS (Tur ner et al.|[200ri l and one EPIC pn 
( Struder et al.|2001) cameras. Two reflection grating arrays de- 



viate about half of the X-ray photons from the EPIC MOS cam 
era towards two Reflection Grating Spectrometers (R GS; |den 
Herder et ar]|2001|l. An Optical Monitor (OM; Mason et al 



2001| l, providing UV and optical images of a fraction of the field 



of view covered by the EPIC cameras down to the 21 mag- 
nitude, complements the X-ray instrumentation. One of the re- 
markable properties offered by the X-ray telescopes on-board 
XMM-Newton is to provide a large field of view of 30' diam- 
eter with a weakly degraded image point-spread function and 
low vignetting even at large off-axis angles. Accordingly, a large 
number of sources may be serendipitously discovered around 
the main target of the observation, which builds up to make an 
X-ray survey with an unprecedented combination of sensitivity 
and area covered. Starting from the beginning of the project, 
ESA recognised the high scientific interest of exploiting the 
XMM-Newton survey and appointed the present Survey Science 
Centre (SSC) on a competitive basis. Lead by the University of 
Leicester, the SSC is a consortium of ten European institutes 
conducting its activity on behalf of ESA. The SSC responsibili- 
ties have been presented in |Watson et al. (2001 1. One of the most 
demanding tasks given to the consortium is the compilation of a 
catalogue of all sources serendipitously discovered in the field of 
view of the X-ray instruments and of their characterisation and 
identification at least in a statistical way. 

Several spectroscopic identification campaigns and multi- 
wavelength studies have been recently performed by the SSC 
on samples of thousands of EPIC sources using follow-up ob- 
servations at 4-m and 8-m class telescopes. The availabilities 
of the recently published SDSS Data Release 7 (DR7) and of 
the incremental version of the 2XMM catalogue (2XMMi) of- 
fer a unique opportunity to extend the identification work to a 
much more extended sky area. With its spectroscopic and photo- 
metric limiting magnitude about 2 magnitudes brighter than that 
typically reached for the SSC source samples, SDSS identifica- 
tions of XMM-Newton sources conveniently expand the identi- 
fied sample towards brighter magnitudes and at the same time 
provide access to a rich group of accurately quantified photo- 
metric and spectroscopic data. 

As part of its scientific activities, the Survey Science Centre 
of the XMM-Newton satellite has developed a specific cross- 
correlation algorithm yielding actual probabilities of identifica- 



tion based on positional coincidence and applied this algorithm 
to the cross-identification of the 2XMMi and SDSS DR7 cata- 
logues, thus creating one of the largest set of optically identified 
X-ray sources available so far. The result of the cross-correlation 
is made available as a sep arate fits file and is also ava ilable 
through the XCat-D^(Mot ch et al. 



2007 



Michel et al. 



2009). 



The first sections of this paper present the details of the al- 
gorithm used to identify 2XMMi X-ray sources with SDSS DR7 
optical objects. We apply the commonly used likelihood ratio to 
quantify the chance that a SDSS object is the counterpart of the 
X-ray source. Identification probabilities are computed with an 
original method that does not rely on Monte Carlo simulations 
and thus offers a better efficiency when cross-correlating large 
sets of data. We then describe the range of optical and X-ray 
parameters occupied by the main astrophysical classes of X-ray 
emitters and show how source classification could be achieved 
on this basis. In the last part of this paper, we investigate two 
example science cases, the search for new QS02s, and the study 
of the properties of the X-ray active late-type star population. 



2. Description of the cross-correlated catalogues 

2.1. 2XMMi catalogue 

The incremental Second XMM-Newton Serendipitous Source 
Catalogue (2XMMi) is an ex tended version of the 2XMM 
Catalogue ( [Watson et al.|2009| l. It has been built from 4117 in- 
dividual pointed observations performed by the XMM-Newton 
Observatory and contains 289 083 heterogeneous detections for 
a total of 221 012 unique X-ray sources. The catalogue covers 
~ 1% of the sky over a large range of Galactic latitudes and 
longitudes. Owing to the wide range of exposure times, the area 
covered sensitively depends on limiting flux and energy range 
(see Fig. 8 in | Watson et al.]|20091l. A 90% complete relative 
sky coverage is reached at Fx = 1 and 9 x lO^'^^erg cm"^ s"' in 
the 0.5-2.0keV and 2.0-12.0keV bands respectively. The EPIC 
cameras encompass a field (FOV) of ~ 30' diameter and are 
sensitive in the energy range of ~ 0.2 - 12 keV. Source posi- 
tions have a typical accuracy of ~ 2". In this paper, we limit 
our analysis to point-like sources with a positional error smaller 
or equal to 5". A source is defined as point-like if its extent 
maximum likelihood parameter {ep^extjnl) is < 4. The result- 
ing 2XMMi source sample consists of 264,361 detections and 
200,067 unique 2XMMi sources. 



2.2. SDSS Data Release 7 

The Seventh Data Re lease of the Sloan Digital Sky Survey 
( Abazajian et al. 2009 i, covers 1 1663 deg^ , mostly in the north- 



ern Galactic cap. A total of 357 million objects have 5 band pho- 
tometry, among which 1.6 million galaxies, quasars, and stars 
were spectroscopically observed. Most of the ~ 2000 deg^ in- 
crement over data release 6 are located at low galactic latitude. 
Astrometric errors are < 0.1" rms. At the 3% error level, the 
catalogue reaches magnitude limits in the range of 20.5 to 22.2 
in the five photometric bands - u, g, r, i and z -. In this pa- 
per we only consider the so-called primary sources of the SDSS 
DR7 Photometric Catalogue as available from the VizieR data 
server. Primary sources are the "main" detection of an object 
and have the best defined set of parameters. For most scientific 
applications, the primary detections are the only ones needed. 
Source lists have been extracted using the VO ConeSearch pro- 



http://xcatdb.u-strasbg.fr/ 
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tocol. The central point of each query is the centre of the FOV of 
the XMM-Newton observation considered and the search radius 
is the distance from the centre to the farthest X-ray source, to 
which we add 3' for completeness. 



3. Counterpart identification procedure 

We discuss in ( |3.1| l how we select optical candidates, taking into 
account arbitrary error ellipses on the source's spherical coor- 
dinates. We compute a likelihood ratio (LR) for each target- 
candidate pair ( 3.2 1. This LR involves a measure of the local den- 
sity using a kernel smoothing method (Appendix |B i. Estimating 
the true LR distribution for spurious associations ( 3.3[ ) then al- 
lows us to compute for each target-candidate pair the probability 
of association only based on positional coincidence (3.4 1. 



3.1. Selection of optical candidates 
3.1.1. Selection criterion 

We consider a target X-ray source and a candidate optical source 
with ax, 6x the equatorial coordinates of the X-ray source; 0-^.^, 
(Tsy. and px„ J the error on ax cos Sx and on 6x and the correlation 
between cr„^ and 0-^^ respectively; a„, do the equatorial coordi- 
nates of an optical source cra„, cts,, and po„,^ the error on a,, cos 60 
and on 60 and the correlation between cr^^^ and o"^^, respectively. 

As everybody implicitly does - except Budavari & Szalay 
( |200 8^ -, we convert the spherical problem into a plane one 
and positional errors are interpreted as usual 2D Gaussians. 
We have chosen a projection on a 2D plane with a frame cen- 
tred on the position of the X-ray source and having for x- 
axis the direction of the optical candidate (Fig. [TJ. Errors on 
positions become Gaussians: Nx{x,y;(j\^,cr^.^,pxo-xxO'vx) and 
No{x - d,y;(T\^,crl^,poCrx^cryJ with d the angular distance be- 
tween the X-ray and the optical source. As suggested by Sinnott 
( [198 4'), d is computed using the Haversine function. The trans- 
formation of the ellipses in the new reference frame is described 
in Appendix [a] 

The density of probability that the two sources are at the 
same location, and thus are the same object, is given by the 
convolution product of these two distributions. It leads to a new 
Gaussian: 



P{x,y) = Nc(x,y\crx,,a-y^.,PcCrx,o-y^), 



(1) 



With al 



PxO-xxCTyx +PoO-x^cry,. 

If the optical source is the counterpart of the X-ray source, it 
falls with a probability y inside the ellipse defined by the equa- 
tion 



(;)■( 



PcO-x,0-y, 



PcCr. 



(2) 



The completeness we have chosen is a 3cr criterion, often used 
as a compromise between the total number of associations and 
the number of counterparts missed (0.3%). This completeness, 
y - 99.7%, leads in 2D to ky - 3.43935. In the frame we have 
chosen, the coordinates of the optical source are x = d and y = 0. 
The selection criterion we adopt will retain all candidates satis- 
fying 



O-X, Vl - (PcO-X,0-yJ^ 



(3) 




Fig. 1. Chosen projection plane: the xy frame is centred on the 
X-ray source position X; the x-axis is the direction towards the 
optical candidate, located at point O. d is the angular distance 
between the two sources. This frame is useful at high declina- 
tions when we cannot consider the meridians any longer - the 
directions of the north pole in X and in (9 - to be parallel. It 
allows us to deal naturally with the poles. 

Table 1. Summary of the astrometric parameters for the 2XMM 
catalogue and for the SDSS DR7. 





2XMMi 


SDSS DR7 




^jradec-crr^ + syserr^ 


raErr 


as 


'\Jradec_err^ + syserr^ 


decErr 


Pa,6 





raDecCorr 



We make the additional following hypotheses. First, we ne- 
glect any systematic offset between the positions of the two 
catalogues. The 2XMMi catalogue as a whole is free of any 
systematic positional offset in a direction of the sky. This has 
been checked by cross-correlating the 2XMM catalogue with 
the SDSS DR5 Quasar catalogue (Watso n et aT]|2009] l. For a 
large number of cases (74% at \b\ > 20°), it was possible to 
correct the astrometry by cross-correlating field X-ray sources 
with USNO Bl.O entries. When no reliable astrometric correc- 
tion could be found, increasing the applied systematic error from 
0.35"to 1.0" accounts for the possible remaining coordinate off- 
set and rotation affecting all the sources detected in a given ob- 
servation ( Watson et al.| 2009l. Second, we assume that all po- 
sitions and associated errors have been computed at the same 
epoch and therefore corrected for proper motions. 

3.1 .2. Application to XMM-SDSS DR7 data 

The 2XMMi catalogue provides a circular error on position 
{radec.err) and a systematic error {syserr) for each source . The 
error on positions of each X-ray source is the quadratic sum of 
these two values: 



— (Tsx — ^radec_err^ + syserr^ . 



(4) 



Because it is symmetric, we have px^^ - 0, px - and thus 
PcCx^o-y,. is directly equal to pocrx„o-y„. 

Positional errors are elliptical in the SDSS DR7 catalogue: 
(Ta„ = raErr, crg^ — decErr and , — raDecCorr. The defini- 
tions of the different parameters are summarized in Table [T] 
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3.2. Likelihood ratio 

We compute a likelihood ratio (LR) for each target-candidate 
pair meeting the criterion of Eq. [3] the probability of finding 
the optical counterpart at a normalised distance r (see below) 
divided by the probability of having a spurious object at that dis- 
tance. 

The density of probability that the two sources are at the 
same location knowing x and y corresponds to the density of 
probability of having the counterpart in x and y, assuming that 
it is the same astrophysical object as the X-ray emitting one. 
The Gaussian Nc (Eq. fl) can be written in its canonical form 

t). Where cr^ and cr„, are the semi-major 



I 



1 / 

exp-2(^ 



and semi-minor axis, in the eigenvector frame (ji:i,3'i), given by 
the eigendecomposition of the variance-covariance matrix of Nc, 



(5) 



We change the scale and switch to polar coordinates, which leads 
to the dimensionless Rayleigh distribution: 



' cr- 



(6) 



Therefore, the new elementary surface becomes jicrMCTm, the sur 
face of the 1 cr (or r = 1) ellipse. 

The LR we use is inspired by the one described in |De Ruiter 
et al. ( 1977 1. As Wolstencroft et al. ( 1986 1, we do not only con- 
sider the first candidate, but all sources satisfying Eq.[3] We thus 
replace the probability ''of finding the first confusing object at 
a distance lying between r and r + dr" by the one of finding a 
confusing object between r and r + dr. 

The probability of finding the optical counterpart (cp) at a 
distance lying between r and r -i- dr is 



d/5(r|cp) — re -' dr . 



(7) 



And the probability of finding a spurious object (spur) between 
r an r -H dr is given by the Poisson law: 



Ap(r\spur) — 2Ardr . 



(8) 



We adopt the local surface density of sources at least as bright 
as m,,, the magnitude of the candidate. Because more sources are 
available in a same given area, the densities computed with this 
method are more local - or more accurate - than densities com- 
puted in arbitrary bins of magnitudes. It is equivalent to comput- 
ing local densities using increasingly sensitive instruments. We 
detail in AppendixjBjthe method used to estimate local densities. 

The likelihood ratio is the ratio of the two probability densi- 
ties and ([8]): 



LR(r) 



dp(r\cp) 



1 



dp(r\spur) 2A 



(9) 



The formalism we apply here aims at providing probabili- 
ties of identification based on positional coincidences only. A 
Bayesian interpretation of the likelihood ratio method is de- 
scribed in Appendix |C] We do not use other information on 
sources such as the spectral energy distribution. Hence, we do 
not add an extra term q(m) to the LR as is done for example in 
Wolsten croft et al 1 [Sutherland & Saunders] ([l992| and 
Brusa et al. ^2007 1. The quantity q(m) corresponds to the prob- 
ability of having among the real counterparts a source of mag- 
nitude m, or in a bin Am around m (see formula (ICS!) of the 



appendix). In this case, q(m) should be local, but then becomes 
hard to estimate. In general the estimate of q{m) is plagued with 
considerable errors which, besides the error on the local density 
estimation, dramatically affect the error on LR. We will see in 
Sect. 3.3 that the q(m) factor is somehow taken into account in 
our reliability function. 

3.3. Computing reliabilities 

Although we use a different LR definition, a different estimator 
of the rate of spurious associations and a different function to 
fit the reliability histogram, we more or less follow the work 
presented in part 3 of Oyabu et al. ( 2005 | l. The method originates 
in Rutledge et al. (2000|. 

We define the reliability of an association in a given bin of 
LR as 



R{LR) = 



Nreal(LR) 



Nreal(LR) + N,.nJLR) 



Nca„d(LR) - N,pur(LR) 
Ncand(LR) 



,(10) 



where Nreai and Nspur are the unknown number of candidates 
which are respectively real and spurious counterparts in a given 
bin of LR; Ncand is the number of candidates in a given bin of 
LR. 

We therefore have to estimate Nspu,-- An often used method 
consists in correlating X-ray sources with artificial samples of 
optical sources. The generated samples have the same charac- 
teristics as the real sources: same density, same positional errors 
distribution, etc. Positions are randomly distributed. The sum of 
the results of these Monte-Carlo samples provides an estimate 
of the number of spurious associations as function of the dis- 
tances, of the LRs, etc. This approach is used by Oyabu et al. 
( |2005| l among others. In [Stephen et al.| ( [2005[ ) the random sam- 
ple consists in a list of "anti [...] sources", which ?ae"mirrored 
in Galactic longitude and latitude" . 

We propose here a new method to estimate the number 
of spurious associations, not based on Monte-Carlo simula- 
tions, but instead directly computing their expected results. 
This scheme off'ers a better computing efficiency when cross- 
correlating huge sets of data. The basic idea of estimating the 
surface of an association related to the total available area can 
be found in [Boiler et al.| ( [1998| l. The method is described in 
Appendix [D] 

In order to avoid computing too many local densities for es- 
timating the rate of spurious associations, we divide the magni- 
tude range into bins and associate all sources in the same mag- 
nitude bin with the mean value of their local density. The width 
of the bins depends on the magnitude accuracy of the catalogue. 
We then compute for all optical and X-ray sources the a-Ma-,„ 
factor, the LR„i„ and LR^a.^. It is thus possible to compute the 
histogram of the expected number of spurious associations ac- 
cording to LR values. To increase the computing efficiency, we 
can bin the o-Mcr,„ values. However, this approach involves an- 
other loss of accuracy for a meagre reduction of computing time. 

As shown in Fig.|2] the histograms used in the computation 
of the reliability are the number of candidates and the number of 
spurious associations grouped in bin of logjg LR. 

3.3.1. Fitting the reliability function 

In order to estimate the number of spurious associations we take 
the relatively realistic example where the X-ray source has at 
most one candidate. The reliability of an association (not to be 
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confused with the integrated reliability for all associations hav- 
ing aLR > I) is directly given by (see Eq. (C.6 1 of the appendix) 



R(LR) = p(id\r) 



1 



P(sp'"') I 



1 + 



(11) 



p(cp) LR 



The term p(cp)/ p(spur) - the probability that the optical source 
is a counterpart divided by the probability that it is spurious - 
must be independent of the dimensionless distance r (see Eq.[6| . 
It is similai- to the term (1 - 0)/0 used in |De Ruiter et al.| ( [T^77| . 
However, p(cp)/p{spur) may depend on the nature of the under- 
lying X-ray source population (e.g. stars, AGN) and may thus 
vary with source properties such as magnitude, optical colour, or 
flux ratios. In order to obtain a LR similar to that used in lBrusal 
|et al.| ( |2007| l for instance, we would need to consider an addi- 
tional parameter q{m) describing the variation of p(cp)/ p(spur) 
with the magnitude (or any other relevant property) of the candi- 
date counterpart. Alternatively, q(m) may be replaced by another 
term such as that playing the role of Bp/y,, in Budavari & Szalay 
( |2008| l. An R(LR) histogram can be built from the NcandiLR) and 



NspitriLR) histograms made using the method explained in the 
previous paragraph. If the ratio p{cp)l pi spur) were independent 
of source properties, R{LR) could be fitted with Eq. (Ill using 
only one free parameter a - p(cp)/ p(spur). Including a term 
q(m) in LR with N/^,,,, bins of magnitude, requires us to build A^a,„ 
R(LR, Am) histograms and fit each of them with functions in Eq. 
( [TT] i having different a parameters. However, in general the lack 
of statistics does not allow us to do so. 

The R(LR) histogram can then be seen as the sum of Nahi 
R(LR, Am) histograms and consequently can be modelled by the 
function 



Rf(LR) = ' with b, 



^ cand\i 



;=1 



(12) 



where Ncand^,,, is the total number of entries in histogram number 

In practice, the histograms are not binned according to LR 
but to logjo(L/?). Best fits were obtained using the function 



Rf(x = logio(ii?)) 



1 + 2 a'.lO-" 



(13) 



with Na,„ - 3, i.e. 6 free parameters. 

The fit is performed using a Levenberg-Marquard algorithm. 
We compute the same number of LR and construct and fit the 
same number of LR histograms as there are magnitude bands in 
the SDSS. 

3.4. Computing probabilities of identification in tfie general 
case 

We now extend the Bayesian approach to X-ray sources hav- 
ing Ncand candidates. We assume that at most one association is 
real. This assumption should be fulfilled in our case for at least 
two reasons. First, we only consider point-like X-ray sources. 
This condition decreases the probability that the detection re- 
sults from two distinct unresolved sources blended in the XMM- 
Newton beam. Second, 95% of the XMM-Newton sources 
matching a SDSS entry with a probability higher than 90% have 
a 0.5-2.0 keV flux higher than 1.65 x 10"'^ ergcm"^ s"'. At this 



SDSS DR7 rmag, Group-0.35-30. 0-45.0 
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Fig. 2. Top: Histograms of the number of associations and of 
the estimate of the number of spurious associations by bin of 
logio(L7?). Bottom: Reliability histogram by bin of logiQ(LR) 
and its fitted curve. In the example, LRs have been computed 
according to the SDSS DR7 r magnitude for XMM sources 
with a systematic error of 0.35 arcsec with a galactic latitude 
30° < 1^71 < 45°. 



flux, source confusion is of the order of a few percent only 
( [Cappelluti et al. 2009). The corresponding source density of 
~500deg-^ fCapp elluti et al.||2009 1 is well below the value of 
2000 deg"^, above which simulations show that source confu- 
sion becomes important ( [Loaring et al. 2005 | l. Similar conclu- 
sions can be drawn for the hard (2-12keV) sources. 
Let us consider Ncand + 1 hypotheses: 

- Hcpi : the i''' optical source is the counterpart 

- H ' • ' 



spur,,!, 



there is no counterpart. 



Then the Bayesian probability that the i''' source is the counter- 
part knowing rj, j e [1, A^„«j] is 



Pid.i = R'i = p(Hcp,\ri n . . . n r^,_,) = 



LRi 



p(Hopi) 



P(H„, 



1 + Z LRj 



.(14) 



' P(H,, 



If 



P(H,„.) 



pic Pi) 



Eq. ( 14 



piHspur,,,,) p{spur,y 

tained following the |Rutledge et al.| ( |2000| l prescription 



leads to the formula below, ob- 



R\ 



R, 
\-Ri 



(15) 



With Eq. ([TT|l, we easily show that Rl(\ - R) 
LRp(cp)l p(spur). Computing pj Hcp,) and p{Hspu r„„) normalis 
ing the terms picpf), p(spuri) as Rutledge et al. ( 2000) do to 

construct f from Ri, we obtain the equality 



P(H,, 



P(spur,) ■ 

We thus apply Eq. ( 15 i to compute the final probabilities of iden- 
tification. 
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Each candidate possesses as many reliabilities as there are 
magnitude bands in the SDSS. The Rj we consider in the final 
probability of identification formula are for each source the best 
of all photometric bands. 

4. Observation grouping 

XMM-Newton EPIC sources are correlated FOV by FOV, ie, ob- 
servation by observation. In order to tail off count-rate noise on 
FOV LR histogram bins without sacrificing resolution, we have 
to increase count statistics. We therefore stacked data from sim- 
ilar FOV: 

- We split into two groups XMM-Newton FOV with different 
systematic errors on position: 0.35" or 1.0". 

- Observations of the LMC and SMC regions are set apart. 

- Because they presumably share objects of same nature 
and same patterns of logN-logS relation, observations are 
grouped according to their galactic latitude. 

As mentioned above, the relation between reliability and 
likelihood ratio depends on the overall properties of the X-ray 
populations present in the optical sample. In addition to galactic 
latitude, we also tested whether the X-ray flux could significantly 
modify the shape of the R(LR) curves. Splitting further 2XMMi 
sources into groups of medium (10"'"^-10"'^ ergcm"^ s"') and 
faint (lO^'^-lO"'"* ergcm"^ s"') 0.2-12 keV flux ranges does not 
change the probabilities of identification by more than ~ 3% in 
most cases. The only noticeable difference is for faint sources 
with identification probabilities below 50%, which tend to show 
even lower identification probabilities by as much as 15%. We 
felt, however, that since the effect is relatively modest and only 
affects sources for which the significance of the identification 
is rather low, priority should be given to the gathering of suf- 
ficiently large subsamples. We thus did not consider any X-ray 
flux dependency in the final implementation. 

5. Results of the 2XI\1l\1i-SDSS DR7 
cross-correlation 

A total of 1337 XMM-Newton FOV hold at least one source with 
a SDSS counterpart candidate within the combined 3 cr search 
radius. These 1337 FOV contain 95452 detections, correspond- 
ing to 73 636 unique 2XMMi sources. The cross-correlation of 
the 2XMMi catalogue with the SDSS DR7 leads to 72 169 "asso- 
ciations" involving 45 727 and 55 726 unique 2XMMi and SDSS 
DR7 sources respectively. This first number represents 20% and 
62% of the unique sources available in the entire 2XMMi cata- 
logue and in the 1337 FOV respectively. The distribution of the 
number of SDSS DR7 candidates by unique 2XMMi sources is 
given in Table |2j and the main properties of the distribution of 
the probabilities of identification and of their cumulative values 
are given in Table [T] We define the sample completeness as the 
fraction of 2XMMi sources having an individual probability of 
identification in SDSS above a given cutoflF relative to the total 
number of 2XMMi sources with SDSS counterparts. In a similar 
manner, sample reliability is the fraction of non-spurious associ- 
ations among 2XMMi sources having an individual probability 
of SDSS association above a given threshold. A total of 7740 
unique 2XMMi sources have several SDSS DR7 candidates. In 
this sub-sample, there are 896 and 2672 2XMMi sources for 
which the candidate with the highest identification probability is 
not the nearest and the brightest SDSS DR7 candidate respec- 
tively. 



Table 2. Number of unique 2XMMi sources (Nx) with Nc SDSS 
DR7 candidates. 



Nc 


1 


2 


3 


4 


5 


6 


7 


8 


>8 


Nx 


37 988 


6059 


1196 


317 


96 


47 


12 


4 


8 



Table 3. Cross-correlation statistics. 



Probability of identification cutoffs 



id 


0.0 


0.5 


0.7 


0.8 


0.9 


0.95 


# detJd 


60567 


53 347 


49 527 


46 387 


40193 


32610 


# src_id 


45 727 


39 839 


36943 


34 605 


30055 


24 327 


R 


79.4% 


91.7% 


94.7% 


96.2% 


98.0% 


99.0% 


C 


100.0% 


96.8% 


92.1% 


87.5% 


77.2% 


63.2% 


FracX 


62.1% 


54.1% 


50.2% 


47.0% 


40.8% 


33.0% 



Note 1. Number of associations with all XMM detection (detJd) and 
all unique sources (srcJd) with a probability of identification above a 
given probability of identification. If an X-ray source has several candi- 
dates, we only keep the one with the best probability of identification. 
Sample reliability (R) and sample completeness (C): we only consider 
the best match for each unique XMM sources having at least one coun- 
terpart. Frac X is the expected fraction of X-ray sources with a counter- 
part in the SDSS DR7 catalogue. 

The left panel of Fig. |3]shows the distribution of the individ- 
ual SDSS source identification probabilities. Most SDSS entries 
found within the combined 3(T search radius from the 2XMMi 
source have a high likelihood to be the true optical counterpart. 
The small tail of very low identification probability objects re- 
flects the expected rising contribution of SDSS entries unrelated 
to the X-ray source at large matching distances. Most SDSS en- 
tries with identification probability higher than ~ 90% are found 
less than 3 arcsec from the X-ray position (Fig. [3] centre and 
right panel). The rather wide spread of the 2XMMi positional 
errors accounts for the scatter affecting the distances at which 
high-probability SDSS sources are found from the X-ray posi- 
tion. 

Expressed in terms of combined 2XMMi + SDSS errors, the 
distance distribution shown in the right panel of Fig. [3] follows 
the usual shape of a Rayleigh distribution. Fitting this histogram 
with a Rayleigh function plus a linear component, we obtain 
<Jr - 0.865 for the Rayleigh curve parameter, 0.178 for the lin- 
ear slope and R - 0.675 for the ratio between the total number 
of real associations and the total number of spurious ones within 
the search radius. Fitting separately distance histograms of the 
sources whose positions were coiTected by eposcorr and uncor- 
rected ones leads to cTr - 0.856, R - 0.698 and cr,- = 1.013, 
R - 0.50 respectively. All eiTors on 0-^, on the slope and on 
the R values are about 0.003. Keeping only the best candidate 
for each unique XMM source, we obtain R=0.81, which is con- 
sistent with the value of 79.4% given in Table [T] The origin of 
this small apparent overestimate (~ 14%) of the positional er- 
rors of eposcorr corrected sources is so far unclear In any case, 
the effect of this slight change on the identification probabilities 
is small. The global effect is to slightly decrease the probabil- 
ities of SDSS entries matching at large distances and to some- 
what increase the probabilities of those located close to the X- 
ray source. 

The practical implementation is described in Appendix |E] 
Whenever optical data are used, we discard SDSS entries 
with recorded magnitudes fainter than 22.2 in any of the pho- 
tometric bands considered. Indeed, objects with magnitudes 
higher than 22.2 tend to have smaller photometric errors than 
brighter ones, clearly indicating that SDSS photometric un- 



F.-X. Pineau et al.: Cross-correlation of the 2XMMi catalogue with Data Release 7 of the Sloan Digital Sky Survey 



7 




Fig. 3. Left: Histogram of the individual probabilities of identification. Centre: probabilities of identification versus matching dis- 
tances. Right: distribution of the distance of SDSS candidates to 2XMMi sources expressed in units of the combined 2XMMi + 
SDSS positional error; black - all matches, red - identification probabilities > 90%. 



certainties and perhaps also mean values are not reliable at 
faint flux. We also ignored all SDSS entries having one of the 
following flag set: BLENDED, DEBLENDED^S JVIOVING, 
SATURATED, INTERP_CENTER, EDGE, SATUR_CENTER, 
PSF_FLUXJNTERP in order to ensure the best photometric 
quality. Unless specified otherwise, we will hereinafter only con- 
sider optical identifications with a probability larger than 90%. 
This threshold applies to both the spectroscopically identified 
sample and to the general photometric sample and corresponds 
to an overall sample purity of 98% (see Table[T]l. 

6. Building an identified sample 

One of the important task given to the SSC is the statistical iden- 
tification and classification of all X-ray sources discovered in 
the wide field of view of the EPIC cameras. The statistical de- 
termination of the nature of any given 2XMMi source will first 
rely on the assessment of the reliability of its association with 
candidate counterparts at other wavelength. The description of 
this important step and of its results are the goals of the present 
paper. 

On the other hand, the subsequent classification stage re- 
quires the knowledge of the parameter space occupied by the 
various groups of astrophysical sources using a "learning sam- 
ple". Therefore, the cross-correlation method presented here al- 
lows us to select in a clean and statistically controlled manner the 
best optical counterparts to 2XMMi sources and constitutes the 
first mandatory step towards building a reliable learning sample, 
which can be later used to define source classification schemes 
using advanced statistical methods. Eventually, the classification 
method, either supervised or not, will provide the most likely na- 
ture of the 2XMMi source (e.g. star, AGN, etc..) with for some 
methods, an estimate of the probability of the classification. First 
attempts to classify 2XMMi sources in two classes (stars and ex- 

( |j009l l and are 



tragalactic) have been presented in Pineau et al 



now implemented in the XCat-DB for the DR3 of the 2XMM 
catalogue. 

Yet it is also well known that a reliable classification can 
only be achieved when the corresponding learning sample cov- 
ers the parameter space spanned by the group of objects to iden- 
tify as evenly as possible, see e.g. White' (2008^ or |Richards et al.| 
( |200 4). Being aware of this important requirement, the SSC has 
designed a general optical identification programme able to ex- 
plore the widely diverse natures of the X-ray emitting objects 



discovered in the XMM source catalogues. Several wide field 
identification campaigns are currently conducted at various X- 
ray flux levels and galactic latitudes, which all aim at building 
completely identified source samples. The nature of the high b 
population is the scope of four distinct projects. The bright part 
is studied by the Bright Sources Survey (XBS or BSS, Delia] 
Ceca et al. 2004; Cacc ianiga et al. 2008). The XMM-Newton 
Medium Sensitivity Survey (XMS, Barcons et al.^2002, Carrera] 
et al.|2007||Barcons et al.|2007ll an d the XMM-2d'F Wide Angle 



Survey (XWAS, Tedds et al.|2006 ^ investigate the properties of 
medium flux sources. The faintest source population is the scope 
of the Subam/XMM-Newton Deep Survey (SXDS, Ueda et al] 
[2008) . Finally, the Galactic plane area is covered by the XMM- 
SSC Galactic Plane Survey ( |Motch|2005||Motch et al.]20T0| . 

The first step towards building a sample of 2XMMi sources 
of known astrophysical nature was to select X-ray sources with 
reliable SDSS DR7 spectroscopic counterparts of a known class 
(i.e., with the specClass attribute pointing to an astrophysical 
object). For our purpose, the three most important groups of 
spectroscopic SDSS targets are the sample of quasar candi- 
dates defined by (Richards et al. (|2004| ), the main galaxy sam- 



ple described in [Strauss et al. ( 2002 1 and all stars belonging 



to the legacy survey and to the Sloan Extension for Galactic 
Understanding and Exploration programme (SEGUE, Yarmy[ 
et al. 2009 ). The AGN sample is mostly a classically UV-excess 
(UVX) selected sample to which is added a small number of red- 
der targets appearing as likely high redshift QSOs. The galaxy 
sample is less biased because it is only selected on brightness 
related criteria in the r band. Stars from the legacy survey were 
mostly selected on the basis of their extreme colours. Among 
them, red dwarfs and CVs are the most likely to match 2XMMi 
sources. The SEGUE programme opens new areas at lower 
galactic latitudes, and its spectroscopic target selection aims at 
covering all spectral types. 

We therefore extracted the SDSS spectroscopic catalogue 
accessible via CasJob, and following the SDSS spectral class 
scheme, define the classes: stars, galaxies, AGN and X-ray ac- 
creting binaries. We list below the origin of the different groups 
of identified sources: 



Stars : i) 2XMMi/SDSS associations having the specClass 
attribute set to 1 or 6 and ii) the sample of stars coming from 
the kernel density classification (see Sect.[6.1|l. 
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Accreting binaries : i) objects in the Downes catalogue of 
cataclysmic variables fDownes et al. 20011). We used here 
the 2006 version, which contains many SDSS discoveries; 
ii) the Ritter catalogue of cataclysmic variables (Ritter & 
Kolb (2003 ) and iii) the Ritter catalogue of LMXRBs ( |Ritter 



Table 4. Results of the classification method applied to the learn- 
ing sample. 



&Kolb 2003). 

Galaxies : 2XMMi/SDSS associations with a probability of 
identification > 0.80 and with the specClass attribute set to 
2. 



- AGN: i) sources from the Veron catalogue ( Veron-Cetty & 
Veron |2006[ l and ii) SDSS DR7 objects associated with a 



2XMMi source with a probability of identification > 0.80 
and having the specClass attribute set to 3 or 4 (QSO or high 
z QSOs). 

We use the range of X-ray luminosity to define several 
groups of active galaxies and consider all extragalactic objects 
as a single class. In particular, we do not make any formal dis- 
tinction between QSO and AGI^ An X-ray source associated 
with both a star and an accreting binary was flagged as an accret- 
ing binary. We applied the same rules for star-AGN and binary- 
AGN pairs of apparently conflicting nature. 

We added sources identified in the XBS and XMS SSC sur- 
veys and with SDSS counterparts. For the XMS, we considered 
their sources with classes NELG, BLAGN, and BLLac as AGN. 
For the XBS, AGN2, AGNl, BLLac and elusive AGN were as- 
signed the general AGN type. 

QS02s candidates taken from | Zakamska et al.| ( |2003| ) and 
Reyes et al. (2008 ) as well as a handful of X-ray selected objects 
(see Sect. |8.1| below) having a reliable match in the 2XMMi cat- 
alogue were added to the identified sample. 

6.1. The stellar identified sample 

Building a clean stellar sample turned out to be more difficult be- 
cause most stellar sources detected in X-rays have optical SDSS 
magnitudes brighter than 15 mag and are flagged as saturated. 
Furthermore, the SDSS DR7 spectroscopic database provided 
only few cross-matches with acceptable properties (i.e. non- 
saturated and probabilities of identification higher than 90%). 
Therefore, in order to enlarge the stellar sample, we applied a 
classification method allowing us to identify stars on the basis of 
their multi-colour properties. 

We performed a kernel density classification (KDC, 



Richards et al.| |2004) on all spatially unresolved (cl=6) SDSS 
candidates. This selection returns 10533 SDSS sources with 
a correlation in the 2XMMi catalogue. The classification 
only uses the four colours u - g, g - r, r - i and / - z as 
parameters. The learning sample used for this classifica- 
tion consists of two classes: star and QSO, since we only 
consider point-like objects in the optical. It has been built 
from aU unresolved SDSS sources, independently of their 
association with a 2XMMi entry. We only retained good 
quahty detections (i.e. no flag SATURATED, BLENDED, 
DEBLENDED_AS_MOVING, INTERP_CENTER, EDGE, 
SATUR_CENTER or PSF_FLUX JNTERP set) that were spec- 
troscopically identified in the DR7. The data have been retrieved 
from the DR7 database with CasJob. The stellar sample contains 
67269 sources flagged by the SDSS specClass attribute as star 



^ A large fraction of these AGNS are found the Veron catalogue 
( |Ver on-Cetty & Veron 2006), which is based on the DR4. Among the 
1290 SDSS DR7 spectroscopic QSOs entries, 836 are also present in 
the Veron catalogue and are classified QSOs or AGN according to their 
absolute blue magnitude MB. 



org\assign 


Star 


AGN 


Star 


96.89% 


3.11% 


AGN 


1.62% 


98.38% 



(STAR or STAR_LATE) and therefore also contains CVs and 
WDs. The non-stellar sample has 75 248 sources flagged by the 
SDSS specClass attribute as QSO (QSO) or high-redshift QSO 
(HIZ_QSO) plus 253 sources flagged by specClass as galaxy 
(GALAXY). For simplicity we call the non-star sample AGN 
sample below. 

Estimates of the probability densities were computed using 
a fixed bandwidth kernel smoothing. The kernel applied uses the 
Epanechnikov profile and the bandwidth was chosen to be equal 
to 0.2 mag. Table|4]lists the results of the self-check of the learn- 
ing sample, i.e., the results of the classification method applied 
to the learning sample only. 

The prior probability p(star) has been set to 0.25 and so 
p(AGN) to 0.75 as a result of iterative kernel density classifica- 
tions converging to this relative number of stars and AGN in the 
SDSS/2XMMi learning sample. In order to select SDSS/2XMMi 
identifications with the best chance to be normal stars, we re- 
moved 13% of all SDSS entries classified as stars, but falling in 
low-density regions of the parameter space (i.e. far from the cen- 
tre of the stellar multi-colour locus) and thus prone to be doubt- 
ful cases such as binaries, unidentified cataclysmic variables, or 
even mis -identified AGN. 

The star/AGN classification has been made according to 
the optical properties of the spectroscopically identified SDSS 
objects. However, by construction, the density distribution in 
colours of the SDSS/2XMMi sample is not likely to follow that 
of the non X-ray emitting SDSS objects and can thus lead to 
some biases. For instance, there may be a considerable overden- 
sity of non-X-ray emitting stars in some part of the 4-d colour di- 
agram where most objects classified as AGN appear to be strong 
X-ray sources. This problem indeed occurs in the region covered 
by the AGN branch, where there is some overlap with A stars. 
Although some A stars do emit X-rays for debated reasons, not 
all do. We thus removed from the SDSS/2XMMi learning sam- 
ple all classified stars with a m - § colour of less than 1.2 (values 
taken from Covey et al.|2007| ). 

The final stellar X-ray sample arising from the KDC contains 
636 unique entries with a classification probability higher than 
99.7% (3 Gaussian cr). However, only 549 of these matches have 
a probability of identification with a 2XMMi source higher than 
90% and were therefore entered in the final identified sample. 

We also checked that the stellar SDSS/2XMMi samp le ad- 
hered to the stellar locus derived by ( Covey et aLpOO?! using 
synthetic photometry in the 4-D colour space. The agreement is 
good, apart for the reddest stars of spectral type later than ~ M5. 

6.2. The final identified sample 

The origin and distribution of the various classes of identified 
objects in the final sample are listed in Tables |2]|6j and|7] Most 
of the extragalactic identified sample comes from the DR7 spec- 
troscopic catalogue through the Veron catalogue, while the vast 
majority of X-ray active stars are actually extracted from the 
KDC source classification. A large fraction of the identified ac- 
creting binaries (mostly cataclysmic variables) also come from 
follow-up SDSS discoveries through the Downes catalogue. 
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Table 5. Distribution of XMS and XBS sources in the identified 
samples. 



Sample 


Galaxy 


AGN 


QSO 




XMS 


XBS 


XMS 


XBS 


XMS 


XBS 


All 


1 





7 


21 


49 


95 


Final 


1 





5 


14 


37 


85 



Note 2. the "Final" sample only contains associations with a probabil- 
ity of identification > 0.9, with SDSS sources unsaturated, not blended, 
and where all magnitudes < 22.2 mag. 

Table 6. Origins of the objects of each type in the final identified 
sample. 



Sample 


Star 


Ac 


Cl 


eting 


B 


nary 






Extragalactic 


DR7 


KDC 


Dow 






Rl 






r: 


c 




DR7 


Veron 


FinaF] 


8 


541 


29 


22 


2 


2021 


1524 



" Downes catalogue of CV. 
* Ritter catalogue of CV. 
Ritter catalogue of LMXB. 

Note that individual sources can be listed in several cata- 
logues. 

Table 7. Distribution of number of unique 2XMMi entries with 
types in the final identified sample. 



Sample 


Star 


Acer. Binary 


Extragalactic 


Sub 


549 


26 


2336 



Finally, for all AGNs with a spectroscopic redshift we com- 
pute the observed X-ray luminosity using 



^abs 



-■AnF 



(1 +zf-y 



^0 Jo 



1 



zdz 



(16) 



where F is the 0.2-12 keV X-ray flux in ergs^'cm"^, Hq - 
73kms ' Mpc 0,„ - 0.3 and — 0.7. The photon index 
y was taken to be 1.9. 



7. Grouping X-ray sources in parameter space 

7.1. Sample properties and shortcomings 

The left and centre panels of Fig. |4] display the X-ray flux distri- 
butions of 2XMMi sources with individual identification prob- 
abilities with SDSS DR7 entries > 0% and > 90%, compared 
to that of all 2XMMi sources present in the SDSS DR7 foot- 
print. Many faint X-ray sources have likely counterparts in the 
SDSS. Figure|4]also shows that the fraction of 2XMMi sources 
with likely SDSS identifications does not vary strongly with X- 
ray flux. It steadily decreases by a factor of 2 from Fx ~ 1 x ~ 



^ s ' . The drop in the 

'^^^ erg cm"^ s"' is proba- 



lO"'-' ergcm"'^ s"' to 1 X ~ 10"' 'erg cm 
identification rate at a flux above ~ 10 
bly caused by the increasing number of bright optical counter- 
parts, which are likely to be flagged as saturated in SDSS and 
therefore absent from our sample. On the other hand, the shape 
of the decline of the SDSS identified fraction with decreasing 
X-ray flux (centre panel) as well as the observed distribution of 
the fxl fopt ratio with X-ray flux (right panel) are both consistent 
with populations of X-ray sources with a weakly varying dis- 
tribution of fxl fopt ratios with X-ray flux. In other words, com- 
paring the SDSS identified sample with the total sample does 



not reveal evidence of strong evolution of fxl fopt with redshift. 
A comparable conclusion was reached in Sect. 4] based on the 
weak dependency with X-ray flux of the reliability / likelihood 
ratio relations. It may thus be possible to extrapolate the proper- 
ties of the 2XMMi/SDSS DR7 photometric identified sources to 
somewhat fainter X-ray and optical fluxes. 

The situation of the "final" spectroscopic identified sample 
clearly differs. Its X-ray flux distribution strongly differs from 
that of the 2XMMi/SDSS DR7 photometric sample and from 
that of the overall 2XMMi sample. Obviously, this discrepancy 
arises from the higher optical brightness needed by spectro- 
scopic observations (see left panel of Fig|4]|. In addition, the 
choice of the spectroscopic targets results from various heteroge- 
neous optical selection criteria and is therefore unlikely to cover 
all X-ray emitting objects above equally any given X-ray flux 
threshold. Some examples are outlined in the sections below. We 
therefore stress that as its stands, this identified sample cannot 
in any manner be used as a learning sample suitable for a sta- 
tistically reliable classification of 2XMMi sources with SDSS 
identifications. 

However, the high number of spectroscopic SDSS matches 
supplemented by other identifications derived from archival cat- 
alogues allows us to build an unprecedentedly large sample of 
X-ray sources of known nature, enriched with accurate multi- 
colour photometry and detailed spectral line measurements. This 
large collection of best quality data offers a unique opportunity 
to study to some extent the parameter locii occupied by the dif- 
ferent classes of X-ray emitters. However, it also allows address- 
ing two important issues. First, finding the most efficient physi- 
cal parameters for separating different groups of X-ray sources. 
Second, it allows us to highlight the parameter regions not well 
covered by the SDSS observing strategy and therefore in need of 
extended spectroscopic studies. 

The huge merit of the dedicated identification programmes 
such as the ones carried out by the SSC is to extend spectro- 
scopic identifications to very low optical fluxes and thus offer a 
unique opportunity to unveil the different populations of extra- 
galactic sources which may appear at fainter optical fluxes and 
in general at higher X-ray to optical flux ratios. The two strate- 
gies, wide and shallow on one hand and narrow and deep on the 
other hand are indeed quite complementary and suited to best 
characterise and scientifically investigate the entire serendipitous 
XMM-Newton catalogues. 

7.2. Tiie main classes of X-ray sources 

We investigate the distribution of the various classes of X-ray 
sources in the original instrumental parameter space. In prin- 
ciple, we could also have used the parameter space resulting 
from a Principal Component Analysis (PCA), thus highlight- 
ing the most significant (or information-rich) linear combina- 
tion of physical measurements. [Pineau et al. (| 2008[ l showed that 
the two first eigenvectors deriving from the PCA analysis of the 
2XMMi/SDSS DR7 sample gathering the largest data variance 
are indeed close to the two main ones used here, namely X-ray to 
optical flux ratios and optical colours. However, taking into ac- 
count eigen-axes of higher orders, which include X-ray spectral 
information in the form of hardness ratios, can slightly improve 



the separation between different classes of sources ( jPineau et al. 
[2008| ). 

We show in Figs.|5]and|6]the positions in the g-i I logifxl fr) 
diagram of the various classes of objects present in the identi- 
fied sample (left panel) and of all 2XMMi sources with only a 
cross-identification with a photometric SDSS entry (right panel). 
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Fig. 4. Left panel: X-ray flux distribution. From top to bottom, black line: all X-ray sources found in 2XMMi fields overlapping 
SDSS DR7, red line: all X-ray sources matching a SDSS entry, red dashed line: 2XMMi/SDSS R7 correlations with individual 
identification probabilities > 90%, blue line: "final" identified sample. Centre panel: ratio of the SDSS identified to total number of 
2XMMi sources. Upper curve, filled squares; all matches, lower curve, empty squares, matches with identification probabilities > 
90%. Right panel: Variation of log(fx/fr) with X-ray flux for 2XMMi/SDSS DR7 sources with identification probabilities > 90% 
and r magnitude brighter than 22.2. In all cases, the combined unique EPIC source detection likelihood is > 6. 




2 - 




Fig. 5. Distribution of spatially unresolved objects in the optical band in the log{f^/fr) versus g - i diagram. Left: the identified 
sample. Blue: AGNs with log{Lx) > 44, green: 44 > log(Lx) > 42, magenta: QS02s - filled squares - X-ray selected - filled 
triangles - optically selected - encircled = Compton Thick (see Sect. 8.1 1, red: stars, cyan: accreting binaries. Right: the entire 



SDSS photometric sample. In this case, the colour codes the range of r magnitude, black: <18, red:18-20, green, 20-21, blue:>21. 
We only show SDSS entries with a probability of identification with an X-ray source higher than 90%, g and / magnitudes brighter 
than 22.2 and errors on ^ - / < 0.2. 



Sources spatially unresolved and extended in the optical band are 
presented separately in the two figures. A majority of 2XMMi 
sources match with SDSS-DR7 photometric entries close to the 
limiting magnitude of the SDSS survey (mag ~ 22), and only a 
relatively small fraction is bright enough to have been selected 
for spectroscopic observations. For instance, over a grand total 
of 60567 2XMMi detections matching a SDSS DR7 entry, 87% 
have an error on their r magnitude below 0.2, but only 12% are 
SDSS spectroscopic targets as well. In addition, the repartition 



of the spectroscopic targets are far from covering uniformly the 
parameter space spun by the optical counterparts of the serendip- 
ituous XMM-Newton sources. 

7.3. Separating stellar from extragalactic sources 

As expected, the fjfr ratio is a very powerful parameter to sep- 
arate the late-type stellar X-ray population in which the high- 
energy emission arises in a magnetic active corona from X-ray 
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Fig. 6. Distribution of spatially resolved objects in the optical band in the log(fx/fr) versus g - i diagram. Left: the identified sample 
blue: AGNs with log(Lx) > 44, green: 44 > log(Lx) > 42, yellow: galaxies with log{Lx) < 42 AGN, magenta: QS02 - filled 
squares = X-ray selected - filled triangles = optically selected - encircled - Compton Thick (see 5 8.1 1, red: stars, cyan: accreting 
binaries. Right: Entire SDSS photometric sample. In this case, the colour codes the range of r magnitude. Black: <18, red: 18-20, 
green, 20-21, blue:>21. We only show SDSS entries with a probability of identification with an X-ray source larger than 90%, g 
and ; magnitudes brighter than 22.2 and errors on g - i < 0.2. 



luminous sources powered by accretion such as active galac- 
tic nuclei or cataclysmic variables. However, the distribution of 
low Lx galaxies in logi f^l f^) clearly overlaps with that of active 
coronae. Introducing the g - i colour index allows us to separate 
the bulk of the stars, especially the reddest M stars from most 
galaxies. Nevertheless, many galaxies, in particular of the early 
type, exhibit optical energy distributions similar to those of G 
type stars, show comparable (/j / fr) and consequently cannot be 
easily distinguished from stars in the g - i / log(f j fr) diagram. 
Obviously, taking into account the spatial extension of the opti- 
cal source allows us to efficiently separate them from stars (see 
Figs.[5]and[6]l. 



Interestingly, the reddest point-like optical sources located 
on the "stellar" branch are also the faintest ones with r magni- 
tudes in the range of 18 to 20 (see Fig.|5] right panel). They also 
appear to exhibit the highest fjfr ratio. This is consistent with 
the known increase of the fj fopt ratio for M stars compared to 
that of earlier spectral types (see e.g. |Vaiana et al.|[198i[ ). We 
note, however, that some high z QSOs have been identified with 
very red point-like objects of fx/fr ratios approaching those of 
active coronae (see Fig.|5]). 



Although cataclysmic variables occupy a locus in the g - i / 
logi fxl fr) comparable to that of most quasars, their distribution 
exhibits a wider spread than that of AGNs. This large scatter can 
be used to provide a high likelihood identification of their class, 
at least for part of them. For instance, very blue objects, typically 
with g — i below -0.2 as well as those with extreme fxl fr, have 
a high probability of being cataclysmic variables. 



7.4. Distinguishing between fhe various classes of 
extragalactic sources 

Figure |5] shows that many of the 2XMMi sources having a coun- 
terpart in the DR7 of the SDSS cluster in a rather narrow range of 
blueish g - i colours in the interval of -0.2 to 0.8. They are char- 
acterised by a logifxl fr) ~ and appear as point-like sources in 
the optical. Their positions in this diagram overlaps with that of 
the vast majority of the spectroscopic SDSS AGN found in our 
identified sample, which for most of them are UV-excess opti- 
cally selected quasars. 

Let us now consider all objects, both spatially resolved and 
unresolved, occupying the UV excess quasar region 'g - i values 
comprised between -0.2 and -1-0.8 and logifjfr) > -1.2). In this 
range of parameters, the mean logifjfr) of the spectroscopically 
identified sample appears slightly shifted by ~ 0.3 dex to lower 
values (i.e. 0.75 mag brighter for a given X-ray flux), compared 
to that of the photometric sample. Since the mean r magnitude 
of the corresponding spectroscopic and photometric-only groups 
are of 18.86 and 20.73 respectively, as a result of the necessar- 
ily brighter optical flux limit of the spectroscopic sample, this 
indicates that the photometric sample is dominated by a slightly 
more remote population of AGN, hence fainter in X-rays and in 
optical than the spectroscopic sample, albeit with a somewhat 
larger mean /,//,-. It can also be seen in Fig. [7] that these UVX 
spectroscopically identified quasars are the most energetic with 
X-ray luminosities in excess of 10"^'*ergs/s. In a general man- 
ner. Fig. |5] shows that the spectroscopically identified sample 
of point-like objects covers the range of parameters populated 
by the photometric cross-identifications for both AGN and stars 
relatively well, except, as quoted above, for the faintest optical 
matches. This identified sample could thus be used as a learning 
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Fig. 7. Distribution of extragalactic objects in the logifxl fr) ver- 
sus g-i diagram according to Lx- Black; > 10"^^, red 10^"* to 10"*^, 
green lO"*^ to 10'*^ blue lO'*^ to lO^^, yellow 10^^ to lO^^^. We 
only show SDSS entries with a probability of identification with 
an X-ray source higher than 90%, g and / magnitudes brighter 
than 22.2 and errors on ^ - / < 0.2. 



sample to statistically identify and classify X-ray sources with 
optical counterparts of comparable brightness. 

This is at variance with the situation prevailing for extended 
sources. As seen in Fig. |6] a considerable number of X-ray 
sources are identified with red spatially extended photometric 
objects, i.e., relatively faint reddish galaxies with g - i ^ 1.0 and 
corresponding log{f^/fr) > -0.5. Unfortunately, the SDSS pol- 
icy for selecting spectroscopic targets does not cover this region 
of the parameter space well. In the few cases in which an optical 
spectrum exists, they are assigned an AGN type. These galax- 
ies are significantly optically brighter than most UVX quasars, 
most of which are in the r mag range of 18 to 21. Their de- 
rived X-ray luminosities in the range of 10'*^-'*'*erg s-i (Fig. I?} 
clearly show that the vast majority of these reddish objects are 
likely Seyfert galaxies. This population extends downwards to 
lower fx/fr ratios, narrowing the g - i range spanned and de- 
creasing their X-ray luminosities. Eventually the brightest ob- 
jects (r ^ 18) merge with the group of "normal" galaxies with 
Lx^ lO'^^erg s~', which is well represented in the spectroscop- 
ically identified sample. These low X-ray luminosities could be 
explained in terms of ULXs, starbursts, or of a collection of low- 
mass X-ray binaries in elliptical galaxies. 



8. Science cases 

In the next two sections we touch upon two distinct science 
cases, one in the extragalactic domain and one related to a galac- 
tic source population. These two examples aim at illustrating the 
range of research that these clean cross-correlated samples allow 
and do not explore in depth all possible paths of investigations. 
In particular, we do not make use of the spectroscopic line data, 
which could provide many additional astrophysical diagnostics. 
The first case considered bears on the topical search for QS02s, 



while the second one explores the X-ray and optical properties 
of active stellar coronae. 



8.1. Searching for QS02 candidates 

The members of the high-luminosity high-obscuration part of 
the AGN population are commonly denominated QS02s. The 
synthesis modelling of the XRB ( [Gilli et al.|2007 [Treister et al. 



[2009 



predict that up to 



20% to the XRB ( [GiUi et al.||20"07l l 

could be produced by QS02s, they could represent ~ 30 - 50% 



of the high luminosity AGN population (e.g. Delia Ceca et al. 
2008|l, and they could probably co-evolve with massive host 



galaxies ( Severgnini et al.|2006 1; it is therefore clear, that hunt- 
ing for QS02 remains one of the most topical activities as is the 
search for associated X-ray and optical signatures. 

Optical selection of QS02s relies on finding objects show- 
ing only narrow emission lines with high-ionisation line ratios 
and high luminosity typically from [OIII], e.g. L[o///] 



> lO^-^L. 



(Zakamska et al. 2003 , Reyes et al. 2008 ). X-ray selection looks 
instead for luminous (-Lx.2-iokcV > 10'*'* erg/s) significantly ob- 



scured (column density Nh > 10 cm ) sources, which are best 
selected in the E > 2 keV hard X-ray band (e.g 
12002 



Caccianiga etaLl[2004 



Delia Ceca et al.||2008 



Mainieri et al. 
Perola et al."2004' 'Vignali et al. 



Krumpe et al. 2008 ). Within the 



2006 ^ 

Unified Model ( |Antonucci|1993| l, obscuration of the central X- 
ray-emitting and Broad Line-emitting regions by an intervening 
torus gives rise to those consistent properties across both bands. 

A priori, QS02s should present red optical colours (since the 
emission of the host galaxy would dominate over the obscured 
AGN), high X-ray hardness ratio^ (because of the predominant 
absorption of the lower energy X-rays) and high X-ray-to-optical 
flux ratio (fx/fr, since the X-rays are less sensitive to absorption 
than the optical range). However, several effects could alter this 
simple recipe. For instance, Compton Thick absorption (defined 
here as Nh<: lO^'^cm"^) would completely absorb direct X-rays 
up to 10 keV, this would alter both the spectral shape (since scat- 
tered primary X-rays have "softer" spectra) and the ratio of op- 
tical to X-ray fluxes would be more typical of "normal" galaxies 
(i.e. log(/,//,) < -1). 

We investigate here whether the position in the overall X-ray 
and optical parameter space of the confirmed/candidates QS02 
discovered so far could give some hint on the way other candi- 
dates could be selected on the basis of broadband high-energy 
XMM-Newton data and optical photometry only. 

To do so we first assembled a sample of bona fide QS02 
from optical and X-ray surveys. The optically selected sam- 
ple has been obtained by cross-correlating the SDSS sample of 
Zak amska et al.| ( |2003| ) and |Reyes etaL] ( [2508| with the 2XMMi 
catalogue. We list in Table |9| the main properties of the QS02 
SDSS candidates matching a serendipitous EPIC source. The 
listed X-ray luminosities were computed assuming an average 
shape (see Sect. |6.2| i for the large band 0.2 to 12 keV energy dis- 
tribution and are not corrected for intrinsic absorptiorj^ We also 



^ The XMM-Newton EPIC hardness ratios use five energy bands, 
which expressed in keV units are 0.2-0.5, 0.5-1, 1-2, 2-4.5 and 4.5- 
12.0. Hardness ratio / is expressed as 



Ci+i - Ci 
Ci+i + Ci 



(17) 



with Cj the count rate in band i corrected for vignetting. 

These luminosities are necessarily less accurate than those derived 
by |Ptak et al.| ( |2006) from a detailed spectral analysis; however the four 
candidates in the list of [Zakamska et aL] ( |2003^ in common with the 
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8-' 


Ntot 


r mag (rms) 


pn HR2 (rms) 


pn HR3 (rms) 


pn HR4 (rms) 


point-like sources 


-hO.25 


9538 


+20.20(1.07) 


-0.115 (0.228) 


-0.397 (0.237) 


-0.219 (0.376) 


-hO.75 


3034 


+20.59 (0.99) 


-0.045 (0.250) 


-0.360 (0.246) 


-0.199 (0.330) 


-Hl.25 


471 


+20.51 (0.84) 


+0.061 (0.352) 


-0.294(0.317) 


-0.188 (0.338) 


-Hl.75 


96 


+20.17 (0.97) 


+0.168 (0.280) 


-0.229 (0.294) 


-0.150 (0.347) 


-H2.25 


31 


+ 19.79(1.02) 


+0.200 (0.315) 


-0.307 (0.282) 


-0.168 (0.598) 


-H2.75 


29 


+ 19.04(1.31) 


-0.081 (0.368) 


-0.447 (0.302) 


-0.162 (0.545) 


extended sources 


-hO.25 


812 


+20.97 (0.84) 


-0.099 (0.222) 


-0.385 (0.255) 


-0.175 (0.425) 


+0.75 


1915 


+20.23 (1.46) 


-0.100 (0.237) 


-0.370 (0.235) 


-0.246 (0.311) 


-Hl.25 


1918 


+ 19.77 (1.44) 


-0.009 (0.307) 


-0.285 (0.335) 


-0.149 (0.341) 


-Hl.75 


1364 


+ 19.89 (0.97) 


+0.172 (0.377) 


-0.111 (0.452) 


-0.028 (0.383) 


-H2.25 


479 


+20.00 (0.60) 


+0.233 (0.414) 


-0.012 (0.482) 


-0.018 (0.337) 


+2.75 


35 


+ 19.81 (0.78) 


+0.027 (0.456) 


+0.056 (0.499) 


+0.027 (0.395) 



Note 3. We only consider here sources with error on ^ - / < 0.2, g and ; mags brighter than 22.2 and individual probability of identification greater 
than 90%. 




-1 -0.5 0.5 1 -1 -0.5 0.5 1 

pn_hr2 pn_hr2 

Fig. 8. Left panel: Distribution of spectroscopically identified objects (optically resolved and unresolved) in the logifxifr) versus 
EPIC pn HR2 diagram, blue: AGNs with /og(Lx) > 44, green: 44 > log(Lx) > 42, yellow: galaxies with /og(Lx) < 42, magenta: 
Type 2 QSO - filled squares - X-ray selected - filled triangles = optically selected - encircled - Compton Thick. Right panel: the 
entire SDSS photometric sample. Black: unresolved objects, green: extended objects, magenta: same as in right panel. We only 
show SDSS entries with a probability of identification with an X-ray source higher than 90%, r magnitudes brighter than 23 and 
errors on HR2 less than 0.3. 



marked in the table the eight SDSS QS02 that are good can- 
didates to be Compton Thick. To define these objects we used 
Lx.meas/Lx,o[in] < 0.0^ whcrc Lx,meas IS the observed 2-10 keV 
luminosity (see above) and Lx^oiii] is the expected intrinsic X- 
ray luminosity (the latter has been computed using the observed 
and the ratio Lo[ni]ILx ~ 0.017 derived for the unob- 



■^0[III] 



scured view of Seyfert galaxies ( Heckman et al.|2005| l.). 



eight discussed in jPtak et al.| ( |20(j6) (all Compton Thin QS02) have 
luminosities consistent within a factor of two. 

' Our criteria to define a Compton Thick AGN is slightly less 
restrictive than that recently used by e.g. [Vignali et al.| j2010^ , 

Lx.meas/Lx.OlIII] <~ 0.01. 



To this sample of optically selected QS02 we added a small 
sample of five X-ray selected QS02 (with the definition above) 
obtained by cross-correlating SDSS, 2XMMi, and a few selected 
lists of X-ray defined QS02 pella Ceca et al.,,2008; Ki-umpe] 
et al. 2008 ; Corral & others 2010| l. It is worth stressing that based 
on a detailed analysis of the X-ray and optical spectral properties 
(see Deli a Ceca eT al. 2008; Corral & o thers 2010), all these X- 
ray defined QS02 are Compton Thin (intrinsic Nh between 10^^ 
cm"^ and few times lO^-' cm"^). 

The position of the confirmed/candidates QS02 in the pa- 
rameter space obtained using fx/fr, the optical colours (in par- 
ticular, g - i) and hardness ratio (in particular HR2) are shown in 
Fig|8]and in Fig|9j we marked with different symbols the several 
"flavours" of QS02, i.e. the X-ray selected QS02 (all Compton 
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Fig. 9. Left panel: Distribution of spectroscopically identified objects (optically resolved and unresolved) in the g - i versus EPIC 
pn HR2 diagram, blue: AGNs with log(Lx) > 44, green: 44 > log(Lx) > 42, yellow: galaxies with log{Lx) < 42, magenta: Type 
2 QSO - filled squares = X-ray selected - filled triangles = optically selected - encircled - Compton Thick. Right panel: the entire 
SDSS photometric sample. Black: unresolved objects, green: extended objects, magenta: same as in right panel. We only show 
SDSS entries with a probability of identification with an X-ray source larger than 90%, r magnitudes brighter than 23 and errors on 
HR2 less than 0.3. 



Thin), the optically selected QS02 and the candidate Compton 
Thick QS02. 

As can be seen in Fig|9] the QS02 generally appear slightly 
redder than the bulk of the unresolved AGN, and of the identified 
galaxy sample having Lx > 10'*'* erg s"' . Their colours are more 
similar to that of the galaxy identified sample with 10*^ <Lx< 
lO'*'* erg s"' (green points in Fig|9j left panel). 

The position of the QS02 in FigjS] and in Fig|9] clearly 
shows a separation between the "confirmed" Compton Thin 
QS02 (magenta filled squares and triangles) and the "candi- 
dates" Compton Thick QS02 (encircled). Interestingly, the 
four "optically selected" QS02 occupying the same region of 
the X-ray selected Compton Thin QS02 have been studied 
in the X-ray domain by (Ptak et al.1'2006); all these sources 
(2XMMiJ01 1522.2+001518^ 2XMMiJ021047.0-100152, 
2XMMiJ122656.4+013124, 2XMMiJ164131.6+385841) are 
described by an absorbed power-law model with an intrinsic 
Nh ~ 2 - 3 X 10^^ cm"^. Therefore the upper right comer of 
FigjH] is probably the best place where to look for Compton 
Thin QS02; as shown in (e.g. |Caccianiga et al.|[2004 [) the very 
positive HR2 reflects the relatively large intrinsic photoelectric 
absorption present in many QS02 and responsible for their 
preferential discovery in hard X-ray surveyq^ 

Finally, we show in Fig. [TOjthe behaviour of the EPIC pn 
HR2 hardness ratio with X-ray luminosity for all spectroscopic 
SDSS targets. The bulk of the SDSS Type 1 QSOs with X-ray 
luminosities higher than lO'^'^erg s"' (0.2-12 keV) cluster around 



* We caution however that the separation in HR2 properties between 
Compton Thin and Compton Thick QS02 is probably clear only for 
sources with redshift as high as in our learning sample (z <~ 0.8); at 
higher redshift we should measure decreasing HR2 values for Compton 
Thin QS02 because the observed X-ray energy band will move to 
higher energies, which are increasingly less affected by absorption. 



an hardness ratio <HR2> = -0.12; the same objects cluster 
around hardness ratios <HR3> = -0.38 and <HR4> = -0.28. 
These hardness ratios are in excellent agreement with the values 
expected from a canonical F = 1.9 power law X-ray spectrum 
undergoing negligible intrinsic absorption and a mean Galactic 
absorption of 1.16 x 10^"cm"^ (the average over all directions 
of galaxy and QSO targets). For this group of QSOs, there is 
no evidence of strong dependence of the power-law index with 
Lx. There is however a small number of QSOs exhibiting a con- 
siderably harder X-ray spectrum (as testified by an increasing 
value of HR2) extending to the same locus occupied by con- 
firmed Compton Thin QS02. As one enters the AGN regime at 
X-ray luminosities below lO'^'^erg s"', the number of extended 
sources with galaxy-like optical spectra rises considerably and 
the shape of the X-ray energy distribution shows a much larger 
scatter. The candidates Compton Thick QS02 seem to populate 
this part of the diagram. 

A number of spectroscopic SDSS entries occupy the same 
region of the LX / HR2 diagram as the reference Compton 
Thin QS02. We explored the nature of these candidates by 
selecting objects with log(Lx) higher than 44, EPIC pn HR2 
greater than 0.5 with an eiTor of less than 0.2 on the hard- 
ness ratio: twelve objects match these conditions. From an in- 
spection of the spectroscopic SDSS data and a literature search 
we found that at least ~ 60% of them are indeed charac- 
terised by absorption at same level: two objects are clearly 
Broad Absorption Line QSOs (SDSS Jl 14312.32-H200346.0, 
SDSS J141546.24+1 12943.4), two are "dust reddened QSOs" 
(SDSS J122637.02-H013016.0, SDSS J143513.90+484149.2) 
and three are Type 2 QSOs (SDSS J105144.24+353930.7, SDSS 
J130005. 34+163214.8, SDSS J134507.93-001900.9). The re- 
maining five objects are apparently "normal" type 1 AGN, with- 
out any specific comment in literature: a detailed analysis of their 
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Table 9. QS02 candidates from Zakamska et al. (|2003 
with EPIC pn observations. 



Reyes et al. ( 2008 1 and X-ray selected samples matching 2XMMi entries 



2XMMi name 



id prob z u-g 

Zakamska et al. (2003) 



g-i 



HR 2 (pn) Log(Lx) Log(/,//;.) 



2XMM J005621.6+003235™ 
2XMMiJ01 1522.2+00151 
2XMM JO 157 16.9-005305 
2XMM J021047.0-100152 
2XMM J103951.5+643005''"* 
2XMM J122656.4+013124 
2XMM J164131.6+385841 



0.964 0.484 1.891 ±0.643 1.800 ±0.061 -0.56 ±0.28 42.78 -0.626 

0.997 0.390 6.023 ±1.239 2.437 ±0.046 0.73 ±0.04 44.26 0.566 

0.988 0.540 0.565 ±0.167 1.836 ±0.050 -0.18 ±0.46 42.78 0.285 

0.999 0.540 0.348 ±0.136 1.771 ±0.047 0.66 ±0.12 44.40 0.980 

0.991 0.402 0.490 ±0.075 1.209 ±0.027 -0.68 ±0.33 43.02 -0.353 

0.998 0.732 0.970 ±0.256 1.134 ±0.056 0.76 ±0.06 44.83 1.063 

0.995 0.596 0.550 ±0.112 1.768 ±0.028 0.63 ±0.05 44.96 1.245 

Additional candidates from Reyes et al. (2008) 



2XMMi J075821. 2+392337™ 


0.912 


0.216 


0.620 


±0.026 


0.891 


±0.010 


-0.04 


tO.27 


42.37 


-1.058 


2XMM J093952.7+355358 


1.000 


0.137 


1.527 


±0.051 


1.534 


±0.007 


-0.08 


t0.06 


43.44 


-0.040 


2XMM J094506.4+035552''''* 


0.995 


0.156 


0.979 


±0.043 


1.092 


±0.025 


-1.00 


tO.28 


41.63 


-1.730 


2XMM J100327. 8+554155 


0.991 


0.146 


1.390 


±0.066 


1.340 


±0.010 


-1.00 


tO.17 


42.40 


-0.772 


2XMM J103408.5+600152''''* 


1.000 


0.051 


1.408 


±0.009 


0.888 


±0.003 


-0.48 


t0.05 


42.29 


-1.232 


2XMM J103456.3+393939''"* 


0.995 


0.151 


1.465 


±0.049 


1.412 


±0.007 


-0.68 


tO.29 


42.88 


-0.512 


2XMM J122709.8+124855 


0.998 


0.194 


1.728 


±0.162 


1.808 


±0.010 


-0.66 


tO.27 


42.88 


-0.686 


2XMMJ1 3 1104.6+272806 


0.998 


0.240 


1.272 


±0.070 


1.702 


±0.010 


-0.34 


t0.07 


42.94 


-0.784 


2XMM 1132419.8+053704''^* 


0.997 


0.203 


0.441 


±0.027 


1.226 


±0.009 


0.01 


tO.47 


42.24 


-1.290 


2XMM J171350.7+572955'''* 


0.883 


0.113 


1.279 


±0.025 


1.396 


±0.006 


-0.69 


tO.21 


42.10 


-1.201 






X-ray selected 


QS02 














2XMMJ1 13148.6+31 1400 


0.995 


0.50 


0.724 


±0.491 


1.654 


±0.091 


0.71 


t0.16 


44.70 


1.437 


2XMM J122656.4+013124 


0.998 


0.73 


0.970 


±0.256 


1.134 


±0.056 


0.76 


t0.06 


44.83 


1.063 


2XMM J134656.6+580316 


0.965 


0.37 


1.472 


±0.538 


2.443 


±0.069 


0.53 


tO.23 


43.88 


0.057 


2XMM J160645.9+081523 


0.960 


0.62 


0.919 


±1.085 


2.565 


±0.216 


0.81 


tO.26 


44.80 


1.607 


2XMM J204043.2-004548 


0.336 


0.62 


1.448 


±2.334 


2.581 


±0.304 


0.70 


tO.ll 


44.72 


1.518 



Note 4. The source 2XMM J122656.4+013124 is present both in the optically selected and in the X-ray selected QS02 sample. 



optical and X-ray properties (e.g. to understand if these latter 
objects could also be classified as "dust reddened QSOs") is be- 
yond the scope of the present paper. The main results of this ex- 
ploration is that, although Compton Thin QS02 do separate well 
in the LX / HR2 diagram, other rare kinds of objects occupy the 
same locus. 

Finally we investigated the possible presence of Compton 
Thin QS02 candidates in the SDSS photometric matches. We 
did this by looking for sources showing evidences of photoelec- 
tric absorption among the 2XMMi sources matching mainly op- 
tically extended SDSS objects characterised by a high fxl fr ra- 
tio and relatively red g - i colours. We thus built histograms of 
the EPIC pn hardness ratios for all 2XMMi/SDSS matches with 
logifxl fr) ^ -0.5 for various g - i colour intervals. The result- 
ing histogram for the EPIC pn HR2 hardness ratio is shown in 
Fig. [11] 

The histograms clearly show a main peak of hardness ratio 
corresponding to the canonical low Nh F = 1.9 power-law spec- 
trum characteristic of the type I QSOs for all g-i colour indexes. 
However, for the reddest objects, typically with ^ - / ^ 1 .5, a sec- 
ondary bump is observed for harder hardness ratios (HR2^ 0.5) 
with values consistent with those of the Compton Thin QS02; 
a similar secondary bump is also present if we use the hardness 
ratios HR3 (HR4^ 0.35) and HR4 (HR4^ 0.15). A detailed in- 
vestigation of these possibly interesting sources is beyond the 
scope of the present paper. 

8.2. X-ray active stars 
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Fig. 10. Variation of the EPIC pn hardness ratios HR2 with X- 
ray luminosity for extragalactic spectroscopic SDSS targets op- 
tically unresolved (black dots) and optically extended (green 
dots). Magenta: Type 2 QSO - filled squares = X-ray selected - 
filled triangles = optically selected - encircled = Compton Thick. 
Errors on hardness ratios are below 0.2 in all cases. 



At the high galactic latitudes covered here by the legacy SDSS 
survey {\b\ ^ 20° with a mean \b\ of 58°), most of the X-ray 
sources are of extragalactic origin. For instance, the Extended 
Medium Sensitivity Survey, which constitutes the largest opti- 



cally identified sample of serendipitous Einstein X-ray sources at 
high galactic latitude, contains only 25% of active stars (Stocke 
jet aI.||T991| ). Optical identification campains of the ROSAT all- 
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Fig. 12. Aitoff projection of the distribution of the 2XMMi 
sources matching identified stars with a probability > 90%. 



and harder energy response (see for instance Fig. 1 in |Herent 
|et al.| ( |2006l l). Active coronae indeed constitute less than 10% of 
all sources identified in the XMM-Newton serendipitous survey 
of Barcons et al. ( j2007 ). The deepest Chandra surveys such as 
the Chandra Deep Field North (Alexander et al. 2003 | l counts 
only ~ 3% of stars. 



This is at variance with the low latitude situation where most 
of the RASS X-ray sources were found to be associated with 
stars ( |Motch et al.|1997 ). XMM-Newton and Chandra have ex- 
tended to lower fluxes the predominance of active coronae on 



the soft X-ray low latitude source population (Motch et al. 2003 
Rogel et al. 2006| l. At higher energies, a longitude dependent 



pn_hr2 

Fig. 11. Variation of the EPIC pn hardness ratio HR2 with g - 
i colour for all 2XMMi matches with extended sources in the 
entire SDSS catalogue only having a logifj^/fr) larger than -0.5. 
Each histogram corresponds to a g - i colour bin of 0.5. Black; 
g - /=0.25, red; 0.75, green; 1.25, blue; 1.75, magenta; 2.25. 
Errors on hardness ratios are below 0.3 in all cases. We only 
show identifications with an individual identification probability 
higher than 90%, g and / mag brighter than 22.2, errors on g - i 
less than 0.2. Histograms are normalised to the total number of 
sources in the colour bins shown in Table [3] 



population of galactic hard X-ray sources appears on the top of 
a usually dominant background of extragalactic sources |Hands 
[etal. 2004; Motch 2006)|Motch et al.|20T0l l. 

As stated above, the number of X-ray emitting and spec- 
troscopically identified stars available in the DR7 is relatively 
small, and because of the scientific goals put forward at the time 
of the selection of targets for spectroscopic follow-up, concen- 
trates on the reddest M type stars. The availability of the SEGUE 
archive in DR7 has somewhat increased the number of spectro- 
scopically identified stars, but its effect on the cross-correlation 
statistics remains small. As mentioned in Sect.|6] in order to in- 
crease the stellar sample towards earlier types, we used a ker- 
nel density classification to identify the SDSS DR7 / 2XMMi 
matches with multicolour properties consistent with those ex- 
pected from stars of main sequence class. 



Figure 12 shows the distribution in galactic coordinates of 
all X-ray active stars present in the identified sample. Including 
SEGUE data in the DR7 has allowed the identification of a few 
stars at low galactic latitude. However, the mean \b\ of the stellar 
sample remains high, {\b\ ~ 45°) and therefore does not change 
the conclusion that the present stellar sample is typical of the 
high Galactic latitudes. 

At high b and typical distances of a few hundred parsecs, 
interstellar absorption remains negligible compared to other un- 
certainties and has little effect on the observed stellar colours. 
We computed the total galactic reddening in the directions of 
each of the X-ray emitting stars following |Schlegel et al. ( T998| l. 
P The average E(B-V) is 0.029 with a rms of 0.025. Assuming the 
absorption coefficients computed by (Girardi et al. 2004 ) (Teff = 
4500K; log^ = 4.5), the maximum reddening applicable on av- 
erage to our sample of stars would be of 0.028 and 0.018 in the 
g- r and r - i colours respectively. Only very few spatially unre- 
solved SDSS sources matching 2XMMi entries have combined 
u - g, g - r and r - / colours compatible with those expected 
from giant class III stars, not to mention supergiants. In a re- 
cent paper, [Guillout et al.| (|2009 ) estimate that their sample of 
\b\ $ 30° RASS sources identified with bright Tycho stars has 
a mean contamination of 35% by evolved stars with a peak at 
60% for K stars. However, a smaller fraction of X-ray emitting 
evolved stars of ~ 10% was present in the sample of|Covey et al. 



sky survey sources (RASS, see |Schwope et al.|[2"000| for the 
bright sample) yielded similar results (e.g. 35% of active stars 
in [Zickgraf et al. ( 1997|l), while the ROSAT deep survey of 
the Lockman field ( [Schmidt et al.|[T998] l, which is ~ 50 times 
more sensitive that the RASS, collected less than 10% of stellar 
sources. Stars constitute a bounded population of comparatively 
soft X-ray sources. Consequently, their relative contributions to 
the high galactic source number count is expected to decrease 
very significantly with increasing sensitivity in Chandra and 
XMM-Newton observations, which both offer a lower flux limit 



(2008 1, which is more representative of what we should expect 
in our case because it was selected at higher galactic latitude and 
fainter flux. Below, we will assume that all stars belong to the 
main sequence, keeping in mind that a fraction of the class III 
and class IV stars, in particular in short period binaries such as 
RS CVn systems, could contribute to some extent. Being con- 
sidered as single dwarfs, these stars would have computed X-ray 
luminosities below that actually emitted. 

Our sample comprises 549 active coronae candidates with 
individual probability of identification above 90%, correspond- 
ing to a total sample reliability of 98%. The interval of ^ - r and 
g - i colours corresponds to K4 to late M5-M6 stars. Earlier stars 
are more often subject to optical saturation and have in general 
lower KDC probabilities. They are therefore excluded from the 
clean identified stellar sample built here. 

Neglecting reddening effects, we computed the distances us- 
ing the absolute magnitude calibration listed in Covey et al. 
( |2007| l and X-ray luminosity using the broad band (0.2-12 keV) 
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flux listed in the 2XMMi catalogue for the EPIC camera. The 
mean photometric distance is 340 pc and all stars have distances 
in the range of 40 pc to 2000 pc. The overall Log(Lx) (0.2- 
12keV) distribution peaks at 29.20 and ranges from ~ 27.1 to 
~ 30.9. This interval of X-ray luminosity covers that exhibited 
by old stars such as the Sun or even less active, up to that emit- 
ted by the most active T Tauri or RS CVn stars. The mean X-ray 
luminosity does not vary with galactic latitude in the interval 
covered by our sample. 

We show in Fig. [T3] the distribution of X-ray active stars in 
the g-r I r-i diagram for two ranges of X-ray luminosities. It can 
be readily seen that the locus of X-ray bright sources is shifted 
by ~ 0.1 mag in colour above that occupied by low Lx active 
coronae and by spectroscopically identified stars from SEGUE 
in general. Active stars appear to be bluer m g - r for a given 
r - /. The effect is particularly clear for the reddest stars with 
g - 1.3, corresponding to MO types and later. This shift can- 
not be due to the lack of reddening correction since the most 
X-ray luminous stars that are expected to be the most remote 
and absorbed ones should appear redder than the closest low Lx 
stars, a trend opposite to what is observed. We also checked with 
the isochrones of |Girardi et al.| ( [2004| l designed for the SDSS 
band passes that age or metallicity effects were unable to explain 
the different colour/colour tracks of the low and high Lx stars. 
Similarly, an enhanced Ha emission expected to be especially 
important for late M stars {g-r > 1.3) can be excluded since 



it would yield larger g-r. Interestingly, Covey et al. ( 2008 1 re- 
port that the counterparts of their extended ChaMP ste lar X-ray 
survey do exhibit a ~ 0. 1 bluer u - g index than average low- 
mass stars, although they do not report any significant change in 
the g-r colour index. Comparing with their work is however 
not straightforward. While the colour/colour track followed on 
average by our active coronae is consistent with that of [Covey 



|et al.| ( |2007| for the entire SDSS and therefore does not indeed 



contradicts the results of [Covey et al. ( |2008| , the difference we 
find rather arises among two groups of active stars. The SDSS 
M stars templates compiled by |Bochanski et aL] ( |2007[ ) could in- 
dicate a similar trend in the u - g colour index between active 
and inactive stars with however a large intrinsic scatter, while no 
such eff'ect occurs in g-r. Unfortunately, we have too few good u 
band measurements to be able to confirm the u- g trend seen by 
Covey et al. ( 2008| l. We agree with these authors that low-level 
optical flaring might be responsible for the bluer colours seen in 
X-ray active M dwarfs. 

Active coronae emit essentially thin thermal spectra domi- 
nated by a series of narrow emission lines superposed on a weak 
continuum. In most cases, two thermal components are required 
to satisfactorily represent the observed energy distribution (see 
a recent review in Giidel & Naze ,2009j ). X-ray studies of open 
clusters and field stars of different ages led to a relatively coher- 
ent picture linking stellar rotation rates, overall X-ray luminos- 
ity and X-ray temperatures. Whereas the young (age < 1 Myr) 
stars in Orion exhibit X-ray spectra with kTi ~ 0.8 keV and 
kT2 ~ 2.9 keV, the analysis of ~ 115 Myr old Pleiades stars 
yields kTi ~ 0.4 ke V and kT2 ~ l.l ke V, while the X-ray corona 
of our Sun can also be characterised by 2T spectrum with kTl ~ 
0.2 keV and kT2 ~ 0.6 keV (see Sung et al. 2008 , and references 
therein). Using ROSAT and ASCA observations of a dozen of 
carefully selected stars, Guedel et al.| ^997) established that 
the overall X-ray luminosity, the temperature of the two com- 
ponents, and the emission measurement ratio of the hot to the 
cool plasma were all decreasing with age and rotation rate. The 
range of X-ray luminosity observed in our survey suggests ages 
younger than ~ 2 Gyr. 




Fig. 13. Distribution of X-ray active stars in the g - r / r - 
i diagram for two ranges of X-ray luminosities. Black dots 
Log(Lx)<29.3. Red crosses Log(Lx)>29.3. The arrow shows the 
direction of interstellar reddening. Only sources with photomet- 
ric errors below 0. 1 mag in g - r and r - i are shown. Density 
contours show the position of all stars identified in the SEGUE 
programme. 



We thus investigated whether the X-ray properties of our 
identified active coronae were depending on luminosity. For 
that purpose we selected the 296 (149) identified stellar X-ray 
sources having errors on 2XMMi hardness ratio 2 (3) of less than 
0.2. This sample was then split into five ranges of luminosity for 
which the mean HR2 was computed, and in order to accommo- 
date the lower number of sources in the harder band, into three 
bands of X-ray luminosities for HR3. Hardness ratio 2 and 3 
measure the relative count rates in the energy ranges 0.5-1.0 keV 
and 1.0-2.0 keV and 1. 0-2.0 keV and 2.0-4.5 keV respectively. 
They are therefore weakly dependent on the EPIC filter used 
and vary little within the range of Nh applicable to the present 
survey. HR2 and HR3 can thus be considered as indicators of 
the intrinsic shape of the X-ray energy distribution well suited to 
the range of temperatures exhibited by stellar coronae. We show 
in Fig.[T4]the variation of the median HR2 and HR3 with X-ray 
luminosity. A clear spectral hardening accompanies the luminos- 
ity increase. Assuming a single temperature plasma undergoing 
Nh~ 1.7 X 10^" cm"^ the HR2 value would imply a thin thermal 
temperature of ~ 0.45 for the lowest Lx bin and 0.72 keV for 
the largest Lx, while for the same range of X-ray luminosities 
the HR3 value would indicate temperatures from ~ 0.45 keV to 
> 1.4keV. The existence of a second hotter thin thermal com- 
ponent naturally accounts for this discrepancy. Unfortunately, 
using only two hardness ratios, it is impossible to fit both kTi, 
kT2 and the ratio of the emission measurements of the hot and 
cool components, which is also expected to vary with X-ray lu- 
minosity ( Guedel et al.|1997} . In spite of this shortcoming, our 
data which benefit from the large throughput of XMM-Newton 
and of its capability to obtain detailed spectral information for 
an unprecedented number of sources, confirm for the first time 
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Fig. 14. Variation of the EPIC pn Hardness ratios 2 and 3 with 
mean X-ray luminosity for stellar X-ray sources with HR errors 
below 0.2. We plot here the median and associated errors. 



the spectral hardening with X-ray luminosity in large groups of 
mixed age field objects. 



9. Summary 

We present the results of the cross-identification of the 
2XMMi catalogue containing over 220,000 XMM-Newton 
EPIC serendipitous sources with the Data Release 7 of the Sloan 
Digital Sky Survey consisting of 357 million unique objects and 
over 1.6 million spectra. In order to ensure the best statistical 
control on the quality of the cross-correlation process, we use 
a likelihood ratio scheme only based on the probability of spa- 
tial coincidence of the X-ray source with the optical candidate. 
Using an original method that does not rely on heavy Monte 
Carlo simulations, we are able to compute true probabilities of 
identifications taking into account the varying fraction of X-ray 



sources expected to have a SDSS match as function of galactic 
latitude. We discuss the details of the statistical method used and 
present reliability and completeness curves for the entire set of 
matches. A total of 30,000 unique X-ray sources have a match 
in the DR7 with a probability of identification above 90%. At 
this threshold, the reliability of the total sample is of 98%, i.e. 
we expect only 2% of spurious cross-identifications, while the 
completeness is 77%, meaning that we miss about a quarter of 
the true matches, which therefore appear with individual proba- 
bilities of identifications below the 90% threshold. 

One of the most ambitious task attributed to the Survey 
Science Centre of the XMM-Newton satellite is the statistical 
identification of all serendipitous X-ray sources discovered in 
the lai-ge field of view of the EPIC cameras. The 2XMMi/DR7 
cross-identified sample offers an interesting opportunity to test 
various strategies and methods which would eventually led to the 
classification of the XMM catalogues. |Pineau et al.| (2008) pre- 
sented a first attempt to reduce the parameter space dimension 
using Principal Component Analysis tools adapted to data af- 
fected by rather large instrumental errors and taking into account 
the intrinsic scatter of the measured quantities. These authors 
also tested the relative merits of various classification methods. 
We use the SDSS DR7 spectroscopic catalogue to build an iden- 
tified sample made of objects of known nature to which we add 
a few entries extracted from specific additional catalogues (e.g. 
Cataclysmic Variables). We find that the most discriminating di- 
agrams involve the fxifr flux ratio and various other parameters 
such as the g - i colour index, EPIC hardness ratios, and the 
extent of the optical source. Active galactic nuclei more lumi- 
nous than 10'*^erg s"' have fx/fr ratios well above those of most 
galactic stars in their range of g - i. Galaxies of lower X-ray 
luminosity overlap the region of the diagram covered by active 
coronae, but can be easily separated from them on the basis of 
the optical extent. 

Not unexpectedly, active galactic nuclei constitute by far the 
most numerous class of X-ray emitters present in the cross- 
correlation. Optically unresolved UV excess quasars are the 
most frequent X-ray emitters encountered and cluster in a well 
defined area in the (fjfr) / ig - diagram. The second largest 
group of X-ray detected AGN consists of optically extended ob- 
jects, with similar or only slightly lower fx/fr ratios, but exhibit- 
ing considerably redder g - i colours. The very scarce SDSS 
spectroscopic coverage of this region does not allow constrain- 
ing their true nature. However, relatively bright galaxies harbour- 
ing active nuclei with Lx somewhat lower than those of UV 
excess quasars are likely to account for most of these objects. 
The vast majority of these "red" AGN exhibit X-ray hardness ra- 
tios consistent with those expected from a typical type I nucleus. 
However, a small fraction of these sources display evidence of 
enhanced photo-electric absorption that could be the signature 
of Compton Thin QS02s, although other species of AGN can 
overlap the same parameter region. As Lx (and f /fr) decrease, 
extragalactic X-ray sources become spatially resolved in the op- 
tical and their g - i colour indices tend to cluster around g - i ~ 
1.3. In parallel, the hardness ratios of lower Lx sources appear 
globally softer. We probably see here the effect of the increasing 
contribution of non-nuclear X-ray sources. Although some stars 
share fjf - ratios and g - i colours similar to some galaxies, tak- 
ing into account the extent of the optical object allows an almost 
perfect separation of the galactic and extragalactic source popu- 
lations. As expected, the latest M stars display larger fxl f- ratios 
than earlier active coronae. 

Using a sample of reliable X-ray and optical Compton Thin 
and Compton Thick QS02s we investigated the possibility of 
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preferentially finding these objects in some specific range of X- 
ray and optical parameters. We find that Compton Thick QS02 
lack outstanding observational optical and X-ray photometric 
signatures that may be used to select sub-samples with greatly 
enhanced densities. However, the photo-electric absorption de- 
tectable in X-rays for relatively low-redshift AGN could be used 
to set apart groups of Compton Thin candidates with a reason- 
able success rate. 

We also find that X-ray bright stars appear slightly bluer in 
g-r than less active coronae. The origin of this effect is not clear, 
but could be related to the UV and blue micro flaring occurring 
in many of the active late type K and M stars. Assuming that 
most active stars are indeed main sequence, we find a strong 
dependency of EPIC hardness ratios 2 and 3 with the inferred X- 
ray luminosity, the most luminous coronae being those emitting 
with the highest temperature. 

The cross-correlation of two large catalogues such as the 
2XMMi and the DR7 of the SDSS paves the way to a wide 
range of investigations. In addition, the SDSS database contains 
line fluxes for over a million of spectroscopic targets, while the 
2XMMi catalogue provides X-ray spectra and time series for 
the 44,000 brightest serendipitous XMM-Newton sources. These 
rich data sets offer unprecedented opportunities for studying how 
the micro-physics of the optically emitting regions relates to X- 
ray properties. Eventually, such studies, in particular those aim- 
ing at finding rare outliers, rely on a solid control of the cross- 
correlation and cross-identification statistics. 

We presented the details of an algorithm based on the clas- 
sical likelihood ratio, well adapted to cross-correlation of large 
catalogues and providing useful probabilities of identification. 
This algorithm is now available as a plug-in for the Aladin VO 
portajj^ 
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Appendix A: Transformation of error ellipses in the 
new frame 

A.1. Spherical trigonometry 

On the unit sphere of centre C, we consider the spherical triangle 
NXO where is the north pole, X and O are the positions of the 
X-ray and the optical sources respectively. We write 

- X, n, o the angular distances between and O, O and X, N 
and X respectively. 

- N, X and O the dihedral angles between planes NCX and 
NCO, XCN and XCO, OCN and OCX respectively. Those 
angles in the range ] - 180 °, 180 °] and are define as positive 
in the East direction. 

We immediately have N - ag- ax, x - 90 ° - do, o - 90 ° - 6x 
and n - d the angular distance between the two sources, which 
is given by the Haversine formula 



JO • , ■ 2^0- 5x . 2 ffo - fx „ r / A 1 ^ 

a - 2- arcsin -v/sin — ^ H sin ^ cosoxcosoo -(A.!) 

The spherical trigonometry gives the formula 
sin/V sinX sin (9 



(A.2) 

n X o 

As n, X and o are positive, we notice that N, X and O have the 
same sign, which depends on ao - ax- As the function arcsin 
returns values e [-90, -h90], we deduce two possible values for 



both angle X and O. The cosine of half-angles X/2 and 0/2 en- 
able to know if \X\ > 90 ° and if > 90 ° respectively. From 
the spherical trigonometry formulae 

s — -(n + X + o) 



cos 



X 
2 




cos 



sin s sin {s - x) 



Sinn sin o 



sin s sin (s - o) 
sin n sin a: 



(A.3) 
(A.4) 
(A.5) 



We derive the values of angles X and O: 



X = 



180- 



arcsin 

1 ^ V2 



.V- sin TV 



COS y > 



> 



-180- 
cos 
arcsin 



arcsin - 

K ^ ^ 

2.2 
v-sin N 



1 80 - arcsin 



O = 



o-sinN 



COS J > 



2 



-180 - arcsin 



COS J > 



arcsin 



o-sin N 



2 



if 

and arcsin -■^'"'^ 
if 

and arcsin ^^^^^ < 

n 

else 
if 

and arcsin > 

n 

if 

and arcsin < 

n 

else 



(A.6) 



(A.7) 



A.2. Transformation of an error ellipse into a 
variance-covariance matrix 

Our starting frame Fs is the plane centred in X and perpendic- 
ular in X to (CX). It has as its x-axis the East direction and as 
its y-axis the North direction. EiTors on positions in astronomy 
are almost always given in the form of an ellipse of major axis 
fl, minor axis b and of angle between the north pole and the 
major axis part oriented in the East direction. So e [0, 180[. 
We can transform the ellipse into a variance-covariance ma- 
trix in the frame Fs. The frame in which the ellipse is in its 
canonical form is obtain by rotating the frame F^ of an angle 
/? = 90-0. Applying a rotation of angle -/3 on the canon- 
ical variance-covariance matrix give us the expression of the 
variance-covariance matrix in F.: 



I cos/? sin/? Y a OVcos/? -sin/3\ 
i -sin/3 cos/3 Ao A sin/3 cos/3 j ' 



It leads to 

(t\ - a sin^ + b cos^ 
- a cos^ + b sin^ 
pcr^o-y = cos sin 0(a - b) . 



A.3. Variance-covariance matrices in our new frame 



(A.8) 

(A.9) 
(A. 10) 
(A. 11) 



As mention in Sect. |3.1| our new frame F„ is centred in X, has as 
its X-axis the direction of the perpendicular in X of (CO) in the 
plane XCO oriented in the East part and as its y-axis the perpen- 
dicular in X of the plane XCO oriented in the North pait. Given 
the results of the previous section, the new frame is obtained 
from the starting one {Fs) by a rotation of angle Q. - 90 - X if 
X is positive and Q - -90 - X if X is negative. The variance- 
covariance matrix of the errors on positions can be expressed in 
the new frame F„ by a matrix y„ 



cos Q sin Q. \l cr\ pcrxCTyO \( cos Q - sin f2 



:)( 



■ sin Q cos Q /\ pcr ^cry erf, 



)( 



sin Q. cos Q 



,(A.12) 
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leading to 

crl^ - cr\ cos^ Q + cr^ sin^ Q + 2 cos Q sin Q.po-xCry (A. 1 3) 

cr^^^ = cr^. sin^ Q + cr^cos^ Q - 2cosQsin QpcTi-cr,, (A. 14) 

PnO'x„o'y,, - cos Q. sin f2(cr^ - cr\) + (cos^ Q - sin^ Q)pcr|(A,.15) 

A.4. Convolution product of two independent 2D Gaussians 

The errors on the X-ray and the optical source are two Gaussian 
random variables Zx and Zg with variance-covariance matrices 



V; 



Zx 



i 



\PxO-xxO-y, 



PxO-xxO- 



of a density map. In our case we consider for each candidate the 
density of sources at least as bright as the candidate. We there- 
fore should compute roughly as many density maps as the num- 
ber of candidates and apply several times the algorithms. Since 
the MCMC technique in our case is time-consuming, it has been 
discarded. 

We therefore investigated variable bandwidth kernel smooth- 
ing techniques. The simplest one is the balloon estimator (see 
|Sain|2002] l. Unfortunately it usually fails to integrate to one. We 
therefore preferred a sample-point density estimator in which 
the estimate of the density at a point x of a FOV containing 
sources at positions Xi, i € [[1, A^]] is given by 
16) 



The random variable Z - Zx + Z^ is define by the convolution 
product of Zx by Z,,. We know that the sum of 2 Gaussians is a 
Gaussian with a variance-covariance matrix: 



Vz 



^ ^"xx+x^ PO-xx+x. 



(A.17) 



As Zx and Z,, are independent, xx and x„ are independent. Idem 
foryx and Jo. We thus have Cov{xx,Xo) - Cov(yx,yo) - 0. The 
variance formula Varia + b) - Var{a) + Var(b) + 2Cov(a, b) 
leads to 



2 2 2 

cr,, , „ = cr„ + O",, 

>x+Vd v'x yx 



The CO variance is given by pcrx,,+x„Cyx+\-„ 



(A.18) 
(A.19) 

E{(xx + x„){yx + 
yo)]-E{{xx+Xo)]E{(yx+yo)]- But as our four ID Gaussian xx, Xo, 
yx and yo are centred, their mean is null and thus E{{xx + Xo)} - 
E{xx]+E{x„] = Oand£{(yx+3'o)) = E{yx]+E{y,] = 0. It leads to 

P^xx+x„ 0"vj-+y„ 

= E{ixxyx)] + E{iyxyo)} + E{{xxyo)} + E[{yxXo)], 
which is a sum of covariance since our four distributions are 
centred. We akeady mentioned that the covariance of xx and Xo 
is null. Idem for yx and yo- We finally obtain 



xx+x„'-' yx+yo 



■ PxO-xxC^yx +PoO-x„0-y^ 



(A.20) 



Appendix B: Estimate of the local density 

In the Icr ellipse (Eq. |6]l, which is the elementary surface, the 
Poissonian density of sources is /i = ;rcr^cr,„/(a^., 6x, mo). The 
accuracy of the likelihood ratio (Eq: |9| depends sensitively on 
the local density estimation f(ax,6x,mo). A simple manner to 
estimate the local density of the SDSS-DR7 catalogue is to use a 
k nearest neighbour {knn) averaging. In this case, the estimated 
densities have a well known error: yfk/mi^, with c/^ the distance 
of the k''^ neighbour from the X-ray source. However, knn av- 
eraging has the impeding property of being non-differentiable. 
In fact, two candidates with similar magnitudes - or two nearby 
sources with the same magnitude - could have quite different 
LRs. The Voronoi tessellation and the use of wavelets also seems 
too complicated and time-consuming for our simple purpose. 

Considering these drawbacks, we preferred to use the 
sample-point density estimator, which is a kernel smoothing, us- 
ing the Epanechnikov profile as kernel function and the distance 
to the 100''' nearest neighbours as bandwidth. 

Kernel smoothing - also called Parzen window technique or 
kernel density estimation - can be performed with a fixed or a 
variable bandwidth. [Z hang et al.| (2006) provide Markov Chain 
Monte Carlo (MCMC) algorithms for estimating optimal data- 
driven fixed bandwidths. These algorithms involve the creation 



\X - Xi\ 

h(Xi) 



(B.l) 



where Kisa kernel function and h(Xi) a bandwidth depending on 
the position. According to |Comaniciu et al.] ( |2001[ ), this estimator 
"is proved to be almost all the time much better than the fixed 
bandwidth estimator". The article gives an usual form of the 
variable bandwidth hixi): 



h(Xi) = ho 



1/2 



(B.2) 



where ho represents a fixed bandwidth, A a proportionality 
constant impacting on the smoothness and / a pilot function. 
It is also mentioned that "the method is insensitive to the fine 
detail of the pilot estimate ", and "a good initial choice is to 
take A as the geometric mean " of the pilot function. 

The selected pilot function stems from a knn averaging: 
f{Xi) - where d^ is the distance to the k''^ nearest neighbour 

at least as bright as the candidate. It is fast to compute and the 
discontinuities can be considered as negligible fine details. As 
suggested in |Comaniciu et al.| ( |2001| l, we define A as the mean 
value of the pilot function: A = -5—, where is the total num- 

ber of sources in the field of view with a magnitude lower than 
or equal to that of the candidate, and where R jov is the radius of 
the FOV. We used a fixed bandwidth, which corresponds to the 

area in which we find, on the average, k sources: ho - -^J^Rfov 
This leads to h(Xi) - d^. In order to complete our estimate of 
the local density we only need to define parameter k, the number 
of nearest neighbours being at least as bright as the candidate 
considered. 

The smaller is k, the more local is the density, but the larger 
is the error on the density estimation. Using A: = 100 leads to a 
relative error on LR of about 15%, which corresponds to 6.5% 
for log 10 LR. In addition, we implemented an algorithm handling 
border effects. 

The selected kernel profile is the 2D Epanechnikov profile 

Ke: 



Ke(t) = 



2(1 





if 

else 



T < 1 



(B.3) 



This truncated parabola allows computational optimisations 
thanks to its finite extent. Moreover, since it minimises the mean 



integrated square error (Comaniciu & Meer 2002 1, this kernel 
maximises the quality of the density reconstruction. 
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Appendix C: Some details on tlie likeliliood ratio 

We consider the case where an X-ray source has only one candi- 
date and formulate two hypotheses: 

Hep', the candidate is the counterpart, it is not random. 
Hspur'- the candidate is a random source belonging to a 
Poissonian distribution of sources of density A. 

The union of these two hypotheses gives the set of all possibil- 
ities: PiH,.p) + P{Hspur) = 1, -i-e Hep = Hspm- The probability 
of finding a candidate at a distance r of the X-ray source can 
therefore be written as: 



P{r) = P{r\H,p) ■ P(H,p) + P(r\H,p,,) ■ P(H,p,,) . 
From conditional probabilities 
P(H,p n r) 



PiHcnlr) 



P{r\H,p) = 



we deduce 



P{r) 
PiH,p n r) 

PiHcp) 



which can be written (Eq. (C.l I) as: 



P(Hcp\r) = 



P(r\Hcp) ■ P(Hcp) 



P(r\Hcp) ■ PiHcp) + P{r\H,p„) ■ P(H,p„) 
We transform that equation into: 

P(.Hcp\r) ^ 



1 + 



(C.l) 

(C.2) 
(C.3) 

(C.4) 

(C.5) 

(C.6) 



PfHcp) 



to exhibit the likelihood ratio 



LR(r) 



PiAHcp) 

P{r\H,pur) 



(C.7) 



The remaining term p!fj'''\ has to be estimated to compute 
P(Hcp\r), the probability that the candidate is a real counterpart. 
Considering the estimate of the number of real counterparts di- 
vided by the number of spurious associations, it leads to the 
61/(1 - e) factor of De Ruite rltaL| ( |l977] l. 

If we add a prior knowledge on the magnitude, the formula 



(C.6 1 is unchanged except that it now expresses P{Hcp\rr\m) and 
the LR becomes 



LR(r n m) 



P(rnm\Hcp) P(r\Hcp)P(m\Hcp) 



P(r n m|//,„„,) Pir\H,p„,)P{m\H,p,„) 



(C.8) 



The term P{r n m\Hspur) = P{r\H spur)P(pAHcp) expresses the 
probability of having a spurious source of magnitude m. It is 
the Poissonian local distribution of sources of magnitude m. The 
term P{r\Hcp)P{m\Hcp) is the classical 2D Gaussian distribution 
times the probability of having, among all real counterparts, a 
counterpart of magnitude m. It can be seen as the distribution 
of counterpart sources according to their magnitudes and corre- 
spond to the q{m) factor in Brusa et al. ( 2007 1 and of q(m, c) in 
[Sutherland & Saunders| ( |T992| l if we ignore object types. 



Appendix D: Computation of the spurious LR 
distribution 

Let us first consider the case of one X-ray and one optical source 
of magnitude m in a FOV. The probability of observing the ran- 
dom X-ray source at a given point (ax, 6x) of the FOV is given 
by the distribution gxi^x, (>x) of the X-ray sources in the FOV. 
The chance for the optical source of magnitude m to be randomly 
associated with the X-ray source is given by the integral of the 
distribution of optical sources at least as bright as m over the sur- 
face of the convolution ellipse 5 i,o - k^ncTMcr^. The probability 
that the association has a LR in a bin ALR is then given by the 
integral of the distribution of optical sources at least as bright as 
m over the surface the bin occupies inside the convolution ellipse 
S x.o- S x,oi^LR). Finally, the sum of these probabilities for all X- 
ray and archival sources provides an estimate of the number of 
spurious associations by LR bin. Considering now the case of 
X-ray and M optical sources in the FOV leads to the formula of 

Nspur(^LR) 



^ I I gx(ax,Sx) ^ I I go(ao,So,m)daod6o 



S,,„(AL«) 



da'xd5,.,(D.l) 



where S x,oi^LR) depends on the X-ray source, the optical 
source, and the position of the X-ray source and can therefore 
be written as S x,oi^LR, X, o, ax, 6x)- 

A possible manner to make the problem simpler is to as- 
sume that goiao,6o,m) is roughly constant inside the surface 



S x,o{^LR). Eq. (D.l I can then be written as 



gx(ax, Sx) 



( N M 



2^ 2j SxMLR)go{ax, 6x, m) 

V.i-l o=I 



daxd5x (D.2) 



In order to estimate Nspur{ALR) we have to compute a complete 
distribution map of the X-ray sources and as many distribution 
maps as the number of distinct optical source magnitudes. Since 
such a computation is potentially time-consuming, we may sup- 
pose that optical sources are uniformly distributed in the FOV 
and have as local density the mean value of their local densities. 
This less rigorous approach leads to the simple formula 



N M 



N,p,riALR)^ 2_^2^ S A-,o(AL/?)/5 fov 



(D.3) 



The quantity S x,o(^LR) can be computed as follows: the ra- 
dius corresponding to a given LR is easily derived from Eq. Q: 



r\LR) = 



I 



if 
if 



LR > LR 



LR„ 



_ X 

2A 

<LR< LR„ 



(D.4) 



if LR < LR„ 



1 



Thus for a bin ALR ^ LR2 - LRi, LR2 > L/^i, if we have LRi < 



LR,„ax and LRn > LR,„in, 
ellipse is expressed by 



the surface of the bin of LR in the 



SxA^LR) = 2no-MO-„, In 



min(max(LR2, LR,„i„), LR,„ax) 
max(min(LR„ax, LRi ), L7;„„„) 



(D.5) 



In principle, we should remove the optical sources which 
are the real counterparts since they are not randomly distributed. 
However, their fraction can be neglected at low LR, and at very 
high LR their influence on the final probability of identification 
is insignificant. For real data, the source distribution is not com- 
pletely Poissonian because two sources cannot be infinitely close 
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due to the PSF, the resolution of the instrument, and the extrac- 
tion algorithm. There is therefore a small surface around each 
counterpart which cannot contain another source. Neglecting 
this can lead to sUghtly overestimate the rate of spurious associ- 
ations. 

Appendix E: Implementation 

All software was written in Java. 2XMMi sources were retrieved 
using direct SQL queries - through a Java DataBase Connection 
- on the XCat-DB. SDSS DR7 sources were collected from the 
VO thanks to the ConeSearch protocol, the correlation has thus 
been performed independently for each XMM observation. 

Local density estimations required a lot of A:-nearest- 
neighbours search. Moreover, for each candidate we are only in- 
terested in sources with a magnitude smaller than or equal to that 
of the candidate. In order to quickly perform such knn queries, 
SDSS DR7 sources have been stored in quadtrees especially de- 
signed for that purpose: each leaf of the tree stores the greatest 
magnitude among those of the sources held in its sub-tree ; knn 
queries contain an extra parameter which is the magnitude of the 
candidate; in addition to the distance criterion, we thus easily 
add a magnitude criterion to stop the search in a sub-tree. 

A quadtree scheme was preferred to a kdtree because it is 
fastest to build and easier to update. It is then possible to build 
the quadtree during the parsing of the VOTable returned by the 
ConeSearch query. 



